|  AD-A120  646 

UNCLASSIFIE 

LARGE  SAMPLE  THEORV 
PROPORTIONAL  HAZARDS 
STATISTICS  T  SELLKE 

FOR  SE 
MODEL 
AUG  8 

3UENTIAL  ANALYSIS 
<U>  STANFORD  UNIV 
2  TR-20  N00014-77 

OF  THE 

CA  DEPT  OF 
-C-0206 

F/G  12/1 

N 

1/ 

L 

■ 

,  j 

IBB 

_ ■ 

■ 

AD  A  120646 


t 


OF  THE  PROPORTIONAL  HAZARDS  MODEL 


BY 

THOMAS  SELLKE 


TECHNICAL  REPORT  NO.  20 
AUGUST  1982 


PREPARED  UNDER  CONTRACT 
N00014-77-C-0306  (NR-042-373) 
FOR  THE  OFFICE  OF  NAVAL  RESEARCH 


Reproduction  In  Whole  or  In  Part  Is  Permitted 
for  any  purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited. 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD.  CALIFORNIA 


DTIC 


fB 


82  10  25  060 


LARGE  SAMPLE  THEORY  FOR  SEQUENTIAL  ANALYSIS 
OF  TOE  PROPORTIONAL  HAZARDS  MODEL 

by 

Thomas  Sellke 
Stanford  University 


TECHNICAL  REPORT  NO.  20 
August  1982 


Prepared  Under  the  Auspices 
of 

Office  of  Naval  Research  Contract 
N00014-77-C-0306  f>’R-042-373) 


DEPARTMENT  OF  STATISTICS 
Stanford  University 
Stanford,  California 


% 


TABLE  OF  CONTENTS 


Chapter  I 
1.1 

1.2 

1.3 

Chapter  II 
2.1 
2.2 

2.3 

Chapter  III 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

Appendix 
A.  1 
A. 2 

References 


Page 

Introduction  and  Summary  1 

A  Review  of  the  Proportional  Hazards  2 

Model 

Application  of  Martingale  Theory  to  the  10 

Simultaneous  Entry  Case 

Medical  Trials  with  Staggered  Entry  of  13 

i.i.d.  Patients 

Trials  with  Simultaneous  Entry  of  Patients  21 

Notation  and  Formulation  of  the  Model  22 

Approximation  of  the  Score  Process  by  a  27 

Brownian  Motion 

Approximation  of  the  Maximum  Partial  Like-  31 

lihood  Estimator  Process  by  a  Brownian 

Motion 

Trials  with  Staggered  Entry  and  Independent  34 
Identically  Distributed  Patients 

Notation  and  Formulation  of  the  Model  35 

Approximation  of  l(t,B)  by  the  Martingale  45 
Q(t) 

Approximation  of  (Q)(t)  by  -i(t,B)  53 

Consistency^of  § (t)  and  Approximation  of  61 

-i{t,6(t)}(B(t)  -  &}  by  i(t,B) 

Approximation  of  the  Martingale  Q(t)  by  a  65 

Brownian  Motion 

The  Main  Theorem  69 

Multidimensional  Covariates  71 

74 

Basic  Facts  About  Martingales  74 

Central  Limit  and  Embedding  Theorems  for  76 

Martingales 


82 


1 


..jijjJL 


SIMfARY 


An  appropriate  large  saaple  theory  for  sequential  analysis  of 
the  Cox  proportional  hazards  model  is  developed.  For  clinical  trials 
with  simultaneous  entry  of  patients,  the  efficient  score  process  of 
the  partial  likelihood  is  easily  seen  to  be  a  martingale.  It  follows 
that,  in  a  time  scale  based  on  the  observed  Fisher  information,  the 
score  process  and  the  properly  normalized  maximum  partial  likelihood 
estimator  behave  asymptotically  like  Brownian  motion.  When  entry  is 


staggered,  the  efficient  score  process  is  no  longer  a  martingale  in 
general.  However,  if  patients  in  a  staggered-entry  clinical  trial 
are  assumed  to  be  independent  and  identically  distributed,  indepen¬ 
dently  of  entry  time,  then  the  score  process  is  well  approximated 


by  a  martingale.  The  asymptotic  results  involving  weak  convergence 


Key  words:  Proportional  hazards  model,  sequential  analysis. 


CHAPTER  I 

INTRODUCTION  AND  SUMMARY 


The  proportional  hazards  model  of  survival  analysis  and  its 
analysis  by  the  method  of  partial  likelihood  originate  in  the  work 
of  Cox  (1972,  1975),  who  argued  that  under  general  conditions  maxi¬ 
mum  partial  likelihood  estimators  have  asymptotically  normal  distri¬ 
butions  very  similar  to  the  asymptotic  distributions  of  ordinary 
maximum  likelihood  estimators.  The  heuristic  arguments  given  by 
Cox,  though  intuitively  compelling,  were  nonrigorous  and  somewhat 
vague,  and  since  then  a  number  of  authors  have  attempted  to  rigor¬ 
ously  justify  Cox's  approach.  See,  for  example,  Bailey  (1979),  Gill 
(1980) ,  Tsiatis  (1981a) ,  and  Andersen  and  Gill  (1981) .  For  the  most 
part,  the  work  of  these  authors  can  be  thought  of  as  referring  to 
medical  trials  in  which  all  patients  enter  simultaneously  or  to  medi 
cal  trials  with  staggered  entry  in  which  statistical  analysis  is 
only  carried  out  at  a  single  predetermined  time.  However,  the 
patients  in  a  medical  trial  typically  do  not  enter  simultaneously, 
and  uncertainty  about  the  rate  at  which  information  will  be  accumu¬ 
lated  in  a  clinical  trial  often  makes  the  idea  of  choosing  a  termina¬ 
tion  time  for  the  trial  in  advance  rather  dubious.  In  addition, 
there  are  the  usual  ethical  and  decision-theoretic  arguments  for 
analyzing  the  data  from  a  medical  trial  sequentially  so  that  the 
trial  may  be  terminated  quickly  if  large  treatment  effects  appear 
to  be  present  (see  Armitage  (1975)). 


The  goal  of  this  dissertation  is  to  justify  sequential  methods 
for  statistical  analysis  of  the  proportional  hazards  model.  Chapter 
II  will  deal  with  medical  trials  with  simultaneous  entry  of  patients. 
Although  this  context  is  often  unrealistic,  the  results  of  Chapter  II 
show  that  sequential  analysis  involving  a  random  rescaling  of  time 
based  on  the  observed  Fisher  information  is  a  natural  approach  to  the 
proportional  hazards  model.  Chapter  III  obtains  results  like  those  of 
Chapter  II  for  the  case  of  staggered  entry  when  the  covariate  and  cen¬ 
soring  characteristics  of  different  patients  are  i.i.d.,  independently 
of  entry  time.  The  Appendix  reviews  several  basic  facts  concerning 
martingales  and  proves  a  Skorokhod  embedding  theorem  for  martingales 
which  is  used  in  Chapters  II  and  III. 

Chapter  I  will  proceed  as  follows.  In  Section  1.1,  the  propor¬ 
tional  hazards  model,  the  partial  likelihood,  and  Cox's  large  sample 
theory  arguments  will  be  reviewed.  The  Bayesian  view  of  the  problem 
will  also  be  considered.  Section  1.2  will  discuss  the  approach  of 
Chapter  II  and  compare  it  to  the  approach  used  by  Andersen  and  Gill 
(1981)  in  a  similar  setting.  Section  1.3  will  summarize  Chapter  III 
and  compare  the  results  to  those  of  Tsiatis  (1981b)  and  Slud  (1982) , 
who  also  deal  with  sequential  procedures  for  medical  trials  with 
staggered  entry. 

1.1.  A  Review  of  the  Proportional  Hazards  Model 

Suppose  a  medical  trial  involving  n  patients  is  conducted. 

Let  A^(»)  be  the  hazard  rate  for  the  i-th  patient,  so  that 


Hie  Cox  proportional  hazards  model  as suae s  that  the  hazard  rate  for 
the  i-th  patient  has  the  form 

(1.2)  X.(t)  =  XQ(t)  exp(B'z.) 

for  t  0.  Here,  XQ(*)  is  a  baseline  hazard  rate,  and  t  is  the 
elapsed  time  after  entry  into  the  medical  trial.  The  components  of 
the  p-vector  z .  are  observable  covariate  values  for  the  i-th 
patient,  and  $  is  a  p-vector  of  unknown  parameters.  In  general, 
z^  is  permitted  to  vary  with  time,  but  in  this  chapter  we  will 
assume  for  simplicity  that  the  z^'s  are  constant  in  time.  The 
model  allows  (right)  censoring  of  patients,  with  the  obvious  restric¬ 
tion  that  the  censoring  not  "anticipate"  deaths.  The  goal  is  to  do 
statistical  inference  on  3  so  as  to  determine  how  the  covariate 
vector  z i  influences  the  hazard  rate.  The  unknown  function  XQ0) 
is  regarded  here  as  an  infinite-dimensional  nuisance  parameter. 

Suppose  first  that  all  n  patients  enter  the  trial  simultane¬ 
ously  and  that  we  have  observed  the  trial  over  the  time  interval 
[0,t].  Let  x^(t)  be  the  amount  of  time  patient  i  was  under 
observation  prior  to  possible  censoring  or  death.  Suppose  m  =  m(t) 
deaths  were  observed  at  times  t  ^  <  l(2)  <  *  *  *  <  t(m)’  where  Patient 
(j)  is  the  patient  observed  to  die  at  time  t^.  The  results  of 
such  a  medical  trial  up  to  time  t  may  be  summarized  as  in  Figure 
1.1.  Define  the  risk  set  Rj  of  the  j-th  death  by 


Thus,  R  consists  of  those  patients  who  were  in  the  trial  and  there- 
fore  at  risk  just  prior  to  the  j-th  observed  death. 

Let  us  now  review  the  definition  and  the  rationale  for  Cox's 
partial  likelihood.  Let  Ht represent  the  observations  in  the 
time  interval  [0,  t^j)  plus  the  occurrence  of  a  death  at  time 
t^j.  Let  represent  the  observations  in  {0,  t^j].  Thus, 

Ht^  includes  the  identity  of  the  patient  (j),  while  Ht^_  does 
not.  For  notational  convenience,  set  t ^  *  0  and  t ^m+1j  =  t. 

Then  Cox  decomposes  the  likelihood  function  of  the  observations  in 
[0,t]  as  follows,  where  the  probabilities1  depend  on  *q(*)»  3,  and 
the  distributions  of  the  censoring  times. 


(1.4)  P(Ht(>ll))  -  H 


«)'  0-1)' 


m-r  A  w 

*  k=l  P{Mt(k)-lKt(k-l)}  *  * 


Note  that 


^oo-lKVi)1 


is  the  conditional  probability  of  the  event  "no  deaths  in  (t  » 
t(k)),  observed  censoring  occurs  in  (t^_i)»  t(k)^*  311,1  3  deat^ 
occurs  at  time  t^",  given  the  observations  in  [0,  If 

one  writes  out  this  conditional  probability  in  terms  of  3,  X0(»)» 
and  the  distributions  of  the  censoring  times,  one  gets  a  messy 


lThe  term  "probability"  is  being  used  rather  loosely  here,  since  these 
"probabilities"  may  include  values  of  density  functions 


expression  from  which  it  does  not  seem  possible  to  extract  much 
information  about  g  as  long  as  the  function  Aq(» )  remains  unknown. 
The  factors  in  the  second  product  are  much  more  useful. 

(1.5)  |Ht^_)  =  P{patient  (j)  dies  at  time  t^j| 

Rj  and  a  death  occurs  at  time  t^} 

exp{g'z^..v} 

I  exp{g'z.) 
ieR.  1 

3 

The  factors  XQ(t^^)  have  been  cancelled  out  of  (1.5),  so  that  g  is 
the  only  unknown  remaining  in  (1.5).  This  second  product  in  the  like¬ 
lihood  (1.4) 


(1.6) 


PL(t.g)  * 


m 

n 

3«1 


/’z(3) 

y  e’zi 

L  e 
ieR. 

3 


is  Cox's  partial  likelihood.  Cox  (1975)  argues  that  one  should  ignore 
the  other  factor  of  the  likelihood  and  use  this  partial  likelihood  in 
the  same  way  that  one  uses  an  ordinary  likelihood.  Use  of  the  partial 
likelihood  is  supported  by  Efron  (1977),  who  shows  that  all  but  a 
small  part  of  the  information  about  g  is  typically  contained  in  the 
partial  likelihood  even  when  the  parametric  form  of  AQ(»)  is  known. 
Thus,  the  loss  of  information  about  g  caused  by  use  of  the  partial 
likelihood  should  be  quite  negligible  when  Xq(*)  is  completely 
unknown.  Moreover,  use  of  "he  partial  likelihood  in  the  case  where 


6 


no  prior  knowledge  or  only  vague  prior  knowledge  about  Aq(* )  is 
available  seems  unavoidable  if  the  problem  is  to  be  tractable. 

Under  the  proportional  hazards  model,  one  can  view  such  a 
medical  trial  as  a  random  series  of  experiments.  An  experiment 
consists  of  having  "nature"  choose  one  person  to  die  from  the  cur¬ 
rent  risk  set,  where  the  probability  that  a  given  patient  is  chosen 

B'z. 

is  proportional  to  this  patient's  value  of  e  .  The  experiments 
themselves  are  random  in  that  the  risk  sets  in  question  and  the 
death  times  are  random.  The  partial  likelihood  is  the  combined  like¬ 
lihood  for  the  outcomes  of  these  experiments  and  ignores  the  random¬ 
ness  involved  in  the  determination  of  which  experiments  are  performed 
and  when  they  are  performed.  The  general  situation  of  which  this  is 
a  special  case  is  a  sequence  of  random  experiments,  where  the  con¬ 
ditional  probability  distribution  of  the  outcome  of  an  experiment, 
given  previous  observations,  only  depends  on  a  parameter  3,  but 
where  the  choice  of  the  next  experiment  to  be  performed  is  random, 
perhaps  with  dependence  on  the  results  of  previous  experiments,  on 
3,  and  on  nuisance  parameters.  The  use  of  a  partial  likelihood  for 
such  sequences  of  random  experiments  is  the  subject  of  Cox  (1975). 
Although  the  discussion  that  follows  will  refer  to  the  proportional 
hazards  model,  the  results  for  medical  trials  with  simultaneous  entry 
of  patients  will  apply  directly  to  the  general  situation. 

For  a  Bayesian  who  is  willing  to  use  the  partial  likelihood, 
the  problem  of  evaluating  data  from  a  medical  trial  is  easy  under 
this  model.  The  Bayesian  just  multiplies  his  prior  density  for  3 
by  the  partial  likelihood,  and  the  normalized  product  becomes  his 
posterior  density  for  3.  Use  of  the  partial  likelihood  avoids  the 


7 


need  to  specify  a  prior  distribution  for  XQ(*).  The  Bayesian  may 
still  be  faced  with  the  decision  of  when  to  stop  a  medical  trial, 
but  this  involves  matters,  such  as  loss  functions  and  the  cost  of 
sampling,  which  will  not  be  explicitly  considered  here. 

Cox  (1975)  suggested  that  the  usual  large-sample  theory  for 
ordinary  likelihoods  be  applied  to  the  partial  likelihood.  The 
logarithm  of  the  partial  likelihood  is  given  by 


(1.7) 


Ut,B)  =  log  PL(t,8) 


t)  B'  z. 


j=l 


(B'z^j  -  log  l  e 


ieR. 


For  one -dimensional  3  and  z^,  the  efficient  score  is  given  by 


(1.8) 


Mt,3)  *  ^  log  PL (t , 6) 


where 


m(t) 

*  /=1  {zH)  - 


Cl.  9) 


I 

ieR. 

6 '  z. 

I  e  1 

ieR. 


The  observed  Fisher  information  of  the  partial  likelihood  is  given  by 


(1.10) 


-Jl(t,B)  =  -  log  PL(t,B) 

3B 

mft) 

=  jl  vo)(e) 


8 


where 


,  S'z. 

I  (z.  -  Am(B)}2  e  1 
ieR.  1  UJ 

(i.ii)  v(j)(e)  - — 1 

Note  that  Am  (B)  is  the  weighted  average  value  of  z.  in  R., 

UJ  Bz.  1  3 

where  each  patient  is  weighted  proportionally  to  e  .  Thus,  the 

increments  of  &(t,B)  are  equal  to 

(1-12)  z(j)  -  Eg(z^|Rj)  . 

It  is  also  easy  to  see  that 

(1.13)  ^(j)^^  °  Var$^Z(j)^j^  * 

Cox  (1975)  claims  that  for  large  m 

l(t,B0)  {-A(t,fl0))"% 

has  approximately  a  N(0,1)  distribution  when  Bq  is  the  true  value 
of  B.  Furthermore,  ’’under  weak  conditions  on  the  third  derivative 
of  the  log  likelihood”,  he  claims  that 

(B-B0)  {-’iCt,^)}1* 

has  approximately  a  N(0,1)  distribution,  where  B  is  the  maximum 
partial  likelihood  estimate  of  Bq.  These  results  immediately  gener¬ 
alize  to  the  case  of  p-dimensional  6  and  z^.  The  efficient  score 


ieR. 


is  now  a  pxl  vector,  and  the  observed  Fisher  information 


-*(t,&)  =  (Vg)2  log  PL(t,8) 

is  now  a  pxp  matrix.  The  formulas  (1.8)  and  (1.9)  for  £(t,B) 

2 

remain  valid,  and  (1.10)  remains  valid  if  the  square  (z^  -  A^(8)} 
in  (1.11)  is  replaced  by  the  pxp  matrix 

(z.  -  A(j)(B)}  (z.  -  A(j)(0)}T  . 

Formula  (1.12)  still  holds,  and  (1.13)  becomes 

(1.14)  V(j)($)  *  C0VB (2 ( j )  I Rj 5  • 

The  asymptotic  distributional  results  are  the  same  with  N(0,1) 
replaced  by  the  p-dimensional  standard  normal  distribution  N(0,  lpxp) • 
In  Cox's  heuristic  argument  for  the  asymptotic  distributions, 
he  treats  the  number  of  deaths  m  as  constant.  The  increments  (1.12) 
are  shown  to  be  uncorrelated  and  with  mean  0.  He  also  assumes  "some 
degree  of  independence"  between  the  increments  (1.12)  of  8,(t,8)  and 
that  the  increments  V^flJ)  of  -X(t,fJ)  are  "not  too  disparate"  in 
size.  Under  these  somewhat  vague  conditions,  the  central  limit 
theorem  presumably  applies. 

1.2.  Application  of  Martingale  Theory  to  the  Simultaneous  Entry  Case 
It  is  not  hard  to  see  martingales  lurking  in  the  background  in 
Cox's  (1975)  argument,  and  it  seems  to  have  become  generally  accepted 


s 


"I  ' 

1 


p 


that  martingale  theory  is  the  natural  mathematical  setting  in  which 

to  investigate  the  large-sample  theory  of  the  partial  likelihood. 

(cf.  Aalen  (1977,  1978,  1980),  Gill  (1980),  and  Andersen  and  Gill 

(1981).)  Bach  coordinate  of  the  efficient  score  process  &(t,B) 

evaluated  at  the  true  value  of  3  is  easily  seen  to  be  a  martingale 

in  t,  where  the  o-algebra  F  is  generated  by  events  in  [0,t]. 

The  information  process  -A(t,3)  is  the  sum  of  the  conditional  (i.e., 

given  Ft^_)  covariances  V^(B)  of  t^ie  increments  of  8,(t,B). 

Thus,  the  asymptotic  normality  of  the  efficient  score  i(t,B)  is 

largely  a  question  of  whether  a  martingale  central  limit  theorem 

applies.  The  basic  requirements  for  applicability  of  a  martingale 

central  limit  theorem  to  £(T,B),  where  T  is  a  stopping  time,  are 
•• 

that  -Jl(T,8)  be  approximately  constant  (or,  if  random,  approximately 

independent  of  the  martingale  process)  and  that  the  jumps  of  l(t,B), 

..  J< 

t  <  T,  be  small  compared  to  {-£(T,B)}  .  The  second  requirement  is  a 
Lindeberg  condition. 

At  least  two  approaches  for  ensuring  the  applicability  of  a 

martingale  central  limit  theorem  to  the  efficient  score  process 

Jl(t,B)  are  possible.  The  first,  used  by  Andersen  and  Gill  (1981), 

•• 

requires  that  the  observed  information  matrix  -£,(t,B)  grow  in  an 
essentially  nonrandom  way.  To  be  specific,  let  I(»),  0  <_  t  £  1, 
be  a  fixed,  continuous  pxp  matrix  valued  function  which  is  non¬ 
decreasing  in  the  sense  that  I (t 2)  -  I(tj)  is  nonnegative  definite 
for  t2  >  tj,  and  for  which  1(1)  is  positive  definite.  Suppose 
that  for  each  n*l,  2,  3,  ...,  we  have  a  medical  trial  involving  n 
patients,  all  of  whom  enter  the  trial  at  time  t=0.  The  assumptions 
in  Andersen  and  Gill  (1981)  imply  that,  as  n-»®. 


r  • 


11 


J 


1  ..  p 

sup  ||l(t)  -  n  {-£  (t,B)}||  -*■  0  , 
t£ [0,1]  n 

where  -3tn(t,B)  is  the  observed  Fisher  information  process  for  the 
n-th  trial.  This  result,  together  with  a  Lindeberg  condition,  implies 

.k  < 

that  n  a  An(t,8)  converges  weakly  to  a  p-dimensional,  independent 
increments  Gaussian  process  with  mean  0  and  covariance  matrix  I(t) 
at  time  t.  If,  in  addition, 

n_1{-iin(i,a*)}  -  KD 

whenever  $*  is  a  consistent  estimator  of  $,  then 

n^fkl)  -  a)  -  N(0,  I"1  (1))  , 

where  0(1)  is  the  maximum  partial  likelihood  estimator  of  a  at 
t=l.  Andersen  and  Gill  show  that  their  assumptions  are  generally 
satisfied  when  the  patients  are  independent  and  identically  distri¬ 
buted  with  respect  to  covariates  and  censoring. 

Suppose  that  and  a  are  one-dimensional.  The  approach 
used  in  Chapter  II  of  this  dissertation  for  guaranteeing  the  appli¬ 
cability  of  a  martingale  central  limit  theorem  in  the  simultaneous 
entry  case  is  to  use  the  accumulated  information  -£(t,B)  as  a  clock 
time.  Thus,  by  definition,  information  accumulates  at  a  constant 
rate  in  this  clock  time.  All  that  is  needed  in  addition  is  a  Linde¬ 
berg  condition  implying  that  the  jumps  of  the  i(t,(J)  and  -£(t,B) 
processes  are  not  too  big.  Chapter  II  will  assume  that  the  covariates 
z A  are  bounded  in  absolute  value  by  a  fixed  constant  B,  so  that  the 


12 


necessary  Lindeberg  condition  is  trivially  satisfied.  It  follows 
that,  in  the  information  time  induced  by  -£(t,0) ,  the  efficient 
score  process  l(t,8)  and  the  properly  normalized  maximum  partial 
likelihood  estimator  process 

{-'£( t.B))1*  (6(t)  -  8) 

are  well  approximated  by  a  Brownian  motion,  and  the  approximation  is 
shown  to  be  uniformly  good  for  medical  trials  satisfying  |z.J  £  B. 
If  one  is  willing  to  work  in  this  information  time,  the  problem  of 
sequentially  estimating  3  or  testing  a  hypothesis  HQ:  0  =  Bq 
becomes  asymptotically  equivalent  to  sequentially  estimating  or  test 
ing  the  drift  of  a  Brownian  motion.  The  major  disadvantage  of  this 
approach  is  that  it  does  not  generalize  to  the  case  of  multidimen¬ 
sional  z^  and  3,  since  a  pxp  information  matrix  cannot  be  used 
as  a  clock  time.  However,  the  results  of  Chapter  II  illustrate  that 
it  is  natural  to  operate  in  a  clock  time  measuring  information  when 
the  rate  at  which  information  will  accumulate  is  unknown  and  perhaps 
random. 

1.3.  Medical  Trials  with  Staggered  Entry  of  i.i.d.  Patients 

Suppose  that  we  have  a  medical  trial  like  the  one  described  in 
Section  1.1,  except  that  the  i-th  patient  now  enters  the  trial  at 
time  yit  where  0  £  £  y2  £  ...  £  yR.  The  observations  of  such 

a  trial  up  until  time  t  may  be  summarized  graphically  as  in  Figure 
1.2.  Let  x^(t)  again  be  the  amount  of  time  patient  i  was  on 
test  before  time  t  prior  to  possible  censoring  or  death.  Let 


13 


Figure  1.3.  Survival  times  for  an  equivalent  trial  with  simultaneous 
entry.  Patients  4  and  5  are  treated  as  censored  when  PL(t)  is  com¬ 
puted.  Note  that  the  risk  set  for  the  death  of  patient  3  changes  if 
the  original  trial  of  Figure  1.2  is  continued  beyond  time  t. 


14 


A.j  (t)  equal  1  if  patient  i  was  observed  to  die  before  time  t, 


and  let  A^(t)  equal  0  otherwise.  Note  that  x^(t)  cannot  be 
greater  than  (t  -  y^)+,  so  that  at  time  t  patient  i  is  in 
effect  censored  at  age  (t  -  yi)+.  If  we  assume  that  the  entry 
times  are  constant  or  have  a  distribution  not  depending  on  B,  then 
the  only  data  from  patient  i  relevant  to  inference  about  3  is 
the  triple  (z^,  x^(t),  A^(t)}.  Thus,  a  simultaneous  entry  medical 
trial  with  the  same  values  of  {z^,  x^(t),  A^(t)}  as  the  trial  des¬ 
cribed  above  would  give  the  same  information  about  B-  The  simul¬ 
taneous  entry  trial  of  Figure  1.3  is  equivalent  to  the  staggered 
entry  trial  of  Figure  1.2.  The  partial  likelihood  for  staggered 
entry  data  is  defined  to  equal  the  partial  likelihood  for  the  equi¬ 
valent  simultaneous  entry  data.  Define 


(1.14) 


R(t,s)  «  (i:  x^t)  _>  s}  , 


so  that  R(t,s)  is  the  set  of  patients  who  by  time  t  have  been  on 
test  for  at  least  s  time  units.  If  A^t)  =  1,  then  R{t,  x-^t)} 
is  the  risk  set  at  time  t  for  the  death  of  patient  i.  Note  that 
the  risk  set  for  a  death  can  now  vary  with  t.  The  efficient  score 
of  the  partial  likelihood  is  given  by 


n 

(1.15)  i(t,3)  =  l  [z.  -  vR(t,  x, (t)} ]  A, (t)  , 

i=l  1  3  i  i 

where 


l  zj  • 

vt,s)  .  JsSLt.,0  -V, 

l  e 

jeR(t.s) 


3'Zj 


(1.16) 


The  observed  Fisher  information  of  the  partial  likelihood  is 


(1.17) 


-£(t,e)  -  l  a2(t,  x.(t)}  A.(t) 
i-1  1  1 


where 


(1.18) 


3g(t,s) 


I  I*. 

j£R(t,S)  3 


flg(t,s)][z..  -  Pg(t,s)]T  e  3 


l  e 

jeR(t.s) 


Note  that  fig(t,s)  is  the  weighted  average  of  covariates  for  patients 
in  R(t,s),  and  o.  (t,s)  is  the  covariance  matrix  for  this  weighted 

p 

distribution  of  covariate  vectors. 

If  we  are  only  going  to  analyze  data  from  a  staggered  entry 
trial  at  a  single  fixed  time  t,  then  the  above  discussion  shows 
that  the  problem  is  equivalent  to  analyzing  data  from  a  simultaneous 
entry  trial  at  a  single  time.  The  results  of  Andersen  and  Gill 
(1981)  described  in  the  last  section  are  very  useful  in  this  case. 

A  Bayesian  would  also  see  little  difference  between  simultaneous 
entry  and  staggered  entry,  since  the  procedure  for  updating  a  prior 
density  for  6  would  be  the  same  in  either  case.  However,  the 
behavior  of  £(t,$)  and  of  §(t)  as  processes  in  t  is  much  more 
difficult  to  analyze  under  staggered  entry,  since  £(t,0)  is  not 
generally  a  martingale.  The  efficient  score  f(t,$)  is  still  a  sum 
over  observed  deaths  of  the  difference  between  the  covariate  of  the 
dying  patient  and  the  weighted  average  of  covariates  in  the  risk  set, 
but  the  risk  set  no  longer  consists  of  those  patients  who  were  on 


test  at  the  tine  of  death,  and  the  risk  sets  even  change  with  t. 

Thus,  the  interpretation  of  the  trial  as  a  sequence  of  random  experi¬ 
ments  no  longer  makes  sense. 

Although  Jones  and  Whitehead  (1979)  proposed  a  sequential  test 
for  staggered  entry  trials  and  did  a  computer  simulation  of  their 
procedure,  the  first  theoretical  result  on  the  joint  distribution 
of  £(t,$)  is  due  to  Tsiatis  (1981b).  Tsiatis  assumes  that  n 
patients  have  i.i.d.  entry  times  distributed  on  a  finite  interval. 

The  patients  also  have  i.i.d.  one-dimensional  covariates  which  are 
constant  in  time  and  independent  of  the  entry  times.  Censoring  is 
not  allowed.  If  n-"»,  with  the  entry  time  distribution,  the  covar¬ 
iate  distribution,  and  the  hazard  function  held  fixed,  Tsiatis  shows 
that,  under  the  null  hypothesis  HQ:  8*0,  the  joint  distribution  of 
n-**  £(t,0)  at  fixed  times  t^,  t2,  ...»  t^  converges  in  law  to  a 
multivariate  normal  distribution  with  mean  0  and  independent  incre¬ 
ments.  The  asymptotic  variance  of  n’*5  £(t^,  0)  is  shown  to  be  pro¬ 
portional  to  P{Aj(t^)  *l).  The  proof  is  based  on  a  clever  decomposi¬ 
tion  of  i(t,0),  which,  in  the  notation  of  Chapter  III,  can  be  written 
as 

(1.19)  £(t,0)  -  Q(t)  ♦  r(t)  . 

The  Q(t)  process  is  a  sum  of  n  i.i.d.  martingales,  one  for  each 

L 

patient.  The  term  r(t)  is  shown  to  be  o(n*)  in  probability  for 
fixed  t.  The  result  follows  easily  from  the  multivariate  central 
limit  theorem.  There  are  several  weakenesses  in  Tsiatis'  result. 

On  the  one  hand,  the  assumptions  are  extremely  strong.  On  the  other 


hand,  the  conclusion  is  rather  weak  and  not  well  suited  to  some 
applications.  The  theorem  specifies  that  the  times  tj,  t^ 

must  be  fixed.  Thus,  they  must  be  chosen  in  advance,  even  though 
one  may  have  little  knowledge  of  how  fast  information  will  accumu¬ 
late  from  the  trial.  Tsiatis  proposes  that  the  times  be  chosen  so  that 

(1.20)  PtAj (t±)  =  1}  =  i/k,  i-1,  2,  ....  k  , 

but  this  would  demand  good  prior  knowledge  about  both  the  entry  time 
distribution  and  the  hazard  rate. 

Chapter  III  of  this  dissertation  proves  results  for  staggered 
entry  trials  very  similar  to  those  given  in  Chapter  II  for  simul¬ 
taneous  entry  trials.  Again,  in  the  information  time  induced  by 
-i(t,0),  the  score  process  £(t,0)  and  the  normalized  maximum  par¬ 
tial  likelihood  estimator  process 

{-’£( t.B)}54  {(J(t)  -  B) 

are  well  approximated  by  a  Brownian  motion.  The  conditions  aTe  much 
weaker  than  those  of  Tsiatis.  As  in  Chapter  II,  the  covariates  are 
assumed  to  be  one-dimensional  and  bounded  in  absolute  value  by  a 
fixed  constant  B.  The  z^’s  are  allowed  to  vary  with  time,  and 
right  censoring  is  permitted.  The  central  distributional  assump¬ 
tion  is  that  the  patients  are  i.i.d.  with  respect  to  covariates  and 
censoring,  independently  of  entry  times.  The  entry  times  themselves 
are  treated  as  ancillary. 


18 


The  proof  of  the  theorem  in  Chapter  III  is  long  and  technical 
but  it  can  be  split  into  three  parts: 

(a)  The  efficient  score  process  £,(t,B)  is  shown  to  be 

close  to  a  martingale  Q(t).  The  observed  Fisher 

•• 

information  process  -£(t,B)  is  shown  to  be  close 
to  ^Q)(t) ,  the  predictable  quadratic  variation  pro¬ 
cess  of  Q(t).  The  proofs  use  a  generalized  version 
of  (1.19). 

(b)  Consistency  of  §(t)  as  an  estimator  of  $  follows 
from  part  (a).  A  Taylor  series  argument  shows  that 
{-^(t.B)}1*  (S (t)  -  B)  is  close  to  J(t,B). 

(c)  The  martingale  Q(t)  is  well  approximated  by  a 
Brownian  motion  in  the  clock  time  induced  by 
(Q)(t) •  The  proof  uses  a  Skorokhod  embedding 
theorem  for  martingales  proved  in  the  Appendix. 

Putting  (a)  and  (c)  together  implies  that  l(t,B)  is  well  approxi¬ 
mated  by  a  Brownian  motion  in  information  time.  By  (b),  the  same 
holds  for  (-iCt.B)}1*  (0(t)-B).  Subject  to  an  innocuous  technical 
condition,  the  approximation  by  Brownian  motion  is  uniformly  good 
for  trials  satisfying  |z.J  £  B  and  with  B  in  a  given  compact 
interval.  The  final  section  of  Chapter  III  discusses  the  generali¬ 
zation  of  these  results  to  the  multivariate  case. 

The  recent  manuscript  of  Slud  (1982)  also  attempts  to  show 
that  the  efficient  score  process  under  staggered  entry  converges 
weakly  to  a  Brownian  motion  in  a  suitable  time  scale.  Slud  intro¬ 
duces  a  martingale  which  is  similar  to  our  Q(t)  to  approximate 


the  score  process.  He  considers  only  3=0  and  uses  a  time  renormal 
ization  which  would  be  inappropriate  for  general  3.  Also,  what 
corresponds  to  our  Proposition  3.1  is  essentially  his  assumption 
A. 5.  This  assumption  is  never  actually  verified,  although  Slud 
states  that  it  can  be  verified  under  various  sets  of  conditions, 
all  of  which  require  strong  hypotheses  on  the  arrival  process. 


CHAPTER  II 

TRIALS  WITH  SIMULTANEOUS  ENTRY  OF  PATIENTS 


The  purpose  of  this  chapter  is  to  show  that  asymptotic  normality 
of  the  maximum  partial  likelihood  estimator  holds  in  great  generality 
when  the  following  three  requirements  are  met. 

(2.1a)  The  covariate  processes  z^(*)  are  one-dimensional. 

(2.1b)  The  hazard  rate  for  the  i-th  patient  has  the  form 
XQ(t)  exp (gzi(t)),  where  t  is  now  the  calendar  time  after  the  begin¬ 
ning  of  the  trial. 

(2.1c)  The  estimation  is  done  sequentially  in  a  clock  time 
(called  information  time)  measuring  observed  Fisher  information. 

When  conditions  (2.1a)  and  (2.1b)  are  satisfied,  the  efficient 
score  process  is  generally  a  one-dimensional  martingale.  The  infor¬ 
mation  time  of  (2.1c)  is  equal  to  the  sum  of  the  conditional  variances 
of  the  jumps  of  this  efficient  score  martingale.  Weak  convergence  of 
the  efficient  score  process  in  this  information  time  to  standard 
Brownian  motion  follows  from  a  martingale  central  limit  theorem.  A 
Taylor  series  argument  shows  that  the  maximum  partial  likelihood 
estimator  process  in  information  time  also  looks  like  a  Brownian 
motion  when  the  estimator  process  is  properly  normalized. 

If  the  hazard  rate  for  patients  in  a  medical  trial  is  a  func¬ 
tion  of  covariates  and  of  time  after  entry  into  the  trial,  then  (2.1b) 
demands  that  all  patients  enter  simultaneously.  Alternatively,  (2.1b) 
is  satisfied  if  the  hazard  rate  for  an  individual  under  study  depends 
on  covariates  and  on  varying  environmental  influences  which  at  any 


time  affect  all  individuals  at  risk  equally.  An  example  would  be  the 
following  model  for  the  occurrence  of  auto  accidents  among  the  people 
driving  in  a  given  area.  (Since  the  version  of  the  proportional 
hazards  model  used  here  implies  that  simultaneous  events  do  not  occur, 
one  could  eliminate  simultaneous  accidents  by  saying  that  a  multiple 
vehicle  accident  only  counts  against  the  driver  who  is  most  at  fault.) 
The  intensity  of  the  Poisson-like  accident  process  for  a  person  who  is 
driving  at  time  t  could  be  assumed  to  have  the  form  XQ(t)  exp(gz(t)), 
where  z(t)  gives  the  level  of  alcohol  in  the  driver's  blood,  and 
Ag(t)  depends  on  environmental  conditions  to  which  all  people  driving 
in  the  area  at  time  t  are  equally  subject,  such  as  weather,  time  of 
day,  and  traffic  conditions.  At  any  time  t,  the  risk  set  would  be 
the  set  of  all  people  actually  driving  in  the  area.  A  person  nqt 
driving  at  time  t  is  considered  to  be  censored.  Since  it  is  possible 
for  a  person  to  cause  more  than  one  accident  and  since  the  censoring 
process  is  more  complicated  than  the  usual  simple  right  censoring,  it 
would  be  necessary  to  use  the  counting  process  formulation  of  the  Cox 
model  found  in  Andersen  and  Gill  (1981)  and  described  below. 

2.1.  Notation  and  Formulation  of  the  Model 

As  was  indicated  above,  the  method  of  this  chapter  requires  that 
the  efficient  score  process  be  a  one-dimensional  martingale.  A  rather 
general  formulation  of  the  Cox  model  which  satisfies  this  requirement 
when  the  covariates  are  one-dimensional  is  given  in  Andersen  and  Gill 
(1981). 


Following  Andersen  and  Gill  (1981),  suppose  we  have  a  medical 
trial  involving  n  patients,  to  each  of  whom  there  corresponds  a 
counting  process  N^(*)  which  counts  observed  events  in  the  life 
of  the  i-th  individual.  These  events  could  be  deaths,  so  that  at 
most  one  event  is  observed  for  each  individual,  or  they  could  be 
recurrent  events,  such  as  epileptic  seizures  or  outbreaks  of  a  rash. 

The  sample  functions  N^(*)  look  like  those  of  a  Poisson  process  in 
that  they  are  increasing,  right-continuous  step  functions  which  in¬ 
crease  by  jumps  of  size  +1  and  satisfy  N^(0)  =  0.  It  is  required 
that  NL(«)  and  Nj(»)  do  not  jump  simultaneously  for  i/j. 

Let  {fi,F,P}  be  the  probability  space  on  which  the  stochastic 
processes  N^(»)  are  defined  for  t  ^  0.  Let  F^  be  the  a-algebra 
generated  by  everything  that  happens  in  [0,t],  so  that  (F^,  t  e  [0,»]} 
is  a  right -continuous,  nondecreasing  family  of  o-algebTas.  It  is 
assumed  that  E  N^(t)  <  ®  for  t  <  “  and  that  the  counting  process 
N^(*)  has  a  random  intensity  function 

Bz. (t) 

(2.2)  X.(t)  -  C.(t)  XQ(t)  e  ,  t  >  0  . 

Here  6  is  a  scalar  parameter,  z^(«)  is  the  one-dimensional  covariate 
process  for  patient  i,  and  Xq(*)  is  a  fixed  baseline  hazard  function. 
The  censoring  process  C^(*)  equals  1  when  patient  i  is  under 
observation  and  is  0  otherwise.  The  processes  zi(»)  and  Ci(*) 
are  assumed  to  be  F^-predictable.  This  condition  is  satisfied  if 
these  processes  are  adapted  and  left-continuous  with  right-hand  limits. 
It  will  also  be  assumed  that  the  z^*)  processes  are  bounded  in 
absolute  value  by  a  fixed  constant  B.  The  situation  described  here 
will  be  referred  to  as  a  B-experiment,  and  the  asymptotic  results  of 


23 


this  chapter  will  be  found  to  hold  uniformly  in  B- experiments.  The 
assumption  that  (2.1)  is  the  intensity  process  for  N^(*)  means  that 


(2.2) 


Mi(t)  =  N^t)  -  J  (u)du 


is  a  local  F^-martingale  with  predictable  quadratic  variation 


(2.3) 


(M^t),  M.(t)>  =  Ai(u)du 


and  that  M^(*)  and  M^(»)  are  orthogonal  martingales  for  ij<j, 


In  this  setting,  the  logarithm  of  the  partial  likelihood  can  be 


written 


_  ft  „  Pz,(s) 

(2.4)  log  PL(t,8)  =  l  tBz.(s)  -  log{£  C .  (s)  e  3  }]  dN.  (s)  . 

i  J0  1  j  3  1 

If  we  take  the  partial  derivative  of  (2.4”»  with  respect  to  8,  we  see 
that  the  efficient  score  process  for  the  partial  likelihood  is  given  by 


(2.5) 


where 


(2.6) 


Mt,8)  *  l  f  (z.(s)  -  y(s)}  dN.  (s)  , 


_  &z. (s) 

I  C.(s)  z.(s)  e  3 

3  J  J _ 

_  6z.(s) 

I  C. (s)  e  3 

■5  J 


In  (2.6)  and  elsewhere,  we  interpret  0/0  as  0.  It  follows  from 


algebra  that 


(2.7) 


*(t,8)  *  l  f  (z.  (s)  -  vi(s)}  dM.  (s)  . 
i  30  1  1 


24 


The  integrands  z^(s)  -  y(s)  are  bounded,  ^-predictable  functions. 
It  follows  from  the  theory  of  stochastic  integrals  (see  Gill  (1980), 
p.  10)  that  A(t,B)  is  a  local  martingale  with  predictable  quadratic 
variation 


(2.8)  <l(t,B),  l(t,B)>  »  l  f  (z.(s)  -  y(s)}2  X.  (s)ds  . 

i  JQ 


Andersen  and  Gill  (1981)  proceed  to  impose  conditions  which  imply  that, 
for  some  fixed  function  !(•)  on  [0,1], 


n~*  sup  |<l(t,B),  i(t,B)>  -  I(t)|  2  0 
t< [0,1] 

as  the  number  of  patients  n-*00.  They  are  able  to  conclude  from  the 
martingale  central  limit  theorem  of  Rebolledo  (1980)  that  n  £(*,B) 
converges  weakly  as  rr*»  to  a  Gaussian  independent-increments  process 
on  [0,1]  with  variance  function  I(*).  Andersen  and  Gill  use  multi¬ 
dimensional  i^(«)  processes  without  the  assumption  that  the  covariates 
are  bounded,  but  the  basic  idea  is  as  described  here. 

It  will  be  useful  in  this  chapter  if  we  introduce  a  new  family 
of  0-algebras.  First  define  the  ordered  event  times  t^  <  t^ 

<  t^j  <  ...,  and  let  t ^  ■  *-  if  fewer  than  a  events  occur. 

Also  take  t^  ■  0.  Now  define 

(2.9)  F^  *  t  <  1  (k*l)^  *  k«0,  1,  2,  ...  , 

and  F*  ■  F^.  Thus,  F*  represents  everything  that  happens  until  just 
prior  to  the  (k+l)st  observed  event,  together  with  knowledge  of  when, 
if  ever,  the  (k+l)st  event  occurs. 


25 


Define 


(2.10) 

Yk  "  ^‘(k)’  ^  ' 

k«0 

(2.11) 

*k  '  Yk  ‘  Yk-1  • 

k-1 

and 

(2.12) 

vk  ■  v,r(Vrk-i>  • 

k*l 

Then,  at  least  under  sufficient  regularity  and  probably  in  general, 
{Y^,  yk=0  is  a  martingale  for  which  the  conditional  variances 
of  the  martingale  differences  satisfy 

k 

(2.13)  l  v  -  -*£(tfkV  B)  . 

i-1  1  lKJ 


Note  that 

t  l  Cj(s){z^(s)  -  V(s)>  e  3 

(2.14)  -*( t.6)  -  l  j  i - - - - ^(s)  . 

10  IC,(S)  .  > 

)  5 

The  heuristic  argument  is  as  in  Section  1.1:  conditional  on  j , 
the  k-th  event  consists  of  nature  randomly  choosing  a  patient  out  of 
the  risk  set  at  time  t,. »  with  probabilities  proportional  to  the 

ezi(W 

weights  e  J  v  '  .  Thus,  is  the  difference  between  the  covar¬ 

iate  of  the  patient  chosen  and  the  weighted  average  y(s)  of  covar¬ 
iates,  so  that  ECXjJf^j)  ■  0.  Furthermore,  v^  is  the  variance  of 
the  covariates  in  the  weighted  distribution,  so  that  (2.13)  holds. 


Approximation  of  the  Score  Process  by  a  Brownian  Motion 


For  u  c  [0,®),  define 


(2. IS) 


k(u)  ■  sup{k:  £  v.  <  u) 
i=l  1 

=  sup{k:  -X(tfkV  0)  <  u) 


Now  define  the  information  time  version  of  the  efficient  score  process 
for  u  e  [0,®),  by 


(2.16) 


S(u)  -  Y. 


*(t(k(u))' 


Also  define  §u  *  F^^,  u  e  and 


(2.17) 


T  -  -i(®,0) 

QO 

•  l  \ 

i-1  1 


Since  v.^  is  F*_j  measurable,  k(u)  is  a  stopping  time  with  respect 
to  the  family  of  o-algebras  (F*,  k»0,  1,  ...,*}.  Hence,  it  is  easy 
to  show  via  the  martingale  convergence  theorem  that  S(»)  is  a  $u- 
martingale  for  u  e  10,®).  Also,  T  is  a  Gu  stopping  time  since 


(T  <  u)  *  PI  v.  <  u)  -PI  (k(u)  >,  n}  €  5  . 

n  1  1  n 

The  5u-martingale  S(»)  is  seen  to  remain  constant  for  u  ^  T. 

By  (2.13),  the  predictable  quadratic  variation  process  of  S(») 

is  "^(tQC(.))»  &)»  80  that  bF  (2 .IS)  the  predictable  quadratic 

2 

variation  of  S(»)  at  time  u  is  between  u  -  B  and  u. 


1 


N 


9 


Suppose  we  have  a  sequence  of  B-experiments  indexed  by  a, 
m-1,  2,  ...  .  Suppose  further  that 

(2.18)  p{«p(®)  >  m)  -►  i  as  »♦«>  . 

Then  it  follows  i mediately  from  the  martingale  central  limit  theorem 
of  Rebolledo  (1980)  (see  Theorem  A.l  of  the  Appendix)  that,  as  m*», 

(2.19)  m"1*  S(B) ((•)»)  *  *(•) 

on  [0,1],  where  W(«)  is  a  standard  Brownian  motion. 

However,  it  seems  more  enlightening  to  approximate  S(*)  directly 
by  a  Brownian  motion.  To  do  this,  it  may  be  necessary  to  enlarge  the 
probability  space  (fl,  F,  P).  Let  Aj,  ...  be  independent  random 
variables  distributed  uniformly  on  [0,1]  and  defined  on  another  prob¬ 
ability  space  {X,  A,  y}.  Let  {fl*,  F*,  P*}  *  {fl  *  X,  F  x  A,  p  x  y) 
be  the  product  probability  space.  Define  K*  9  Fk  *  V  where 

»  o{Aj,  ....  Ak).  The  starred  random  variables  Y*.  v*.  S*u^ ,  etc. 
are  just  Y^,  v^,  S(u),  etc.  considered  as  random  variables  on  the 
product  space.  The  following  proposition  is  an  immediate  consequence 
of  Theorem  A. 2  in  the  Appendix. 

Proposition  2,1  (Skorokhod  representation  for  {Y^} .) 

There  exists  a  standard  Brownian  motion  N(«)  and  a  sequence  of 
random  variables  0  ■  tq  £  Tj  <  *2  £  •  •  •  on  F*,  p*)  such  ^Aat 

(2.20)  holds. 

(2.20a)  X*  -  W(xk)  -  W(Tk  j) 

(2.20b)  E(Tk  -  VjIf;:,)  .  v* 


28 


5 

a 


(2.20c)  var(rk  -  1  2BZ  v* 

(2.20d)  is  Fk*-measurable,  and  the  pre-tk  a-algebra  of  W(») 
is  contained  in  F**. 


Theorem  2.2  (Approximation  of  the  efficient  score  process  by 
a  Brownian  motion) 

Let  W(«)  be  the  Brownian  motion  of  Proposition  2.1,  and  let 
c  >  0.  Then,  as  K-*“, 

P(  |  S*  (u)  -  W(u)  I  <  K  +  u*+£,  Vu  e  10,T)>  1  , 


uniformly  in  B-experiments. 

Proof.  By  (2.20a)  and  the  definition  of  S*(»), 

k  k+1 

(2.21)  S*(u)  ■  W(r. )  for  £  v*  £  u  <  ][  v*  . 

K  j  i  j  i 

k 

The  idea  here  is  to  show  that  t.  -  £  vt  *s  sufficiently  small  so 

K  1  1  k  k+l 

that  W(u)  -  W(t.)  is  small  for  u  between  )  v*  and  Y  v*. 

i  ..  i  1  ■  1 

By  (2.20b,d) ,  [  vf  is  an  F^  -martingale.  By  (2.20c) 

and  Kolmogorov's  inequality  (see  Doob  (1953),  Theorem  3.2,  p.  314), 
for  each  m«l,  2,  ...  we  have 


(2.22)  P{  sup  |xk 

k<k(2m) 

Using  the  formula 


l  v*|  > 
1  1  ~ 


P(,up  |W(u)|  >  b)  <  4(1  -  «(— )) 


(2.23) 


P{  sup  ||W(u*h)  -  W(u)  |  >  24  4> 

0<u<2B 

OKhO^t1*6)/2 

■  +  m. 

<  P(  sup  |W(u.+h)  -  W(u.)|  >  2*  2) 

0£j<2®^2 

0<h<2B(1+£)/2 

<  2“/2  •  4  •  (1  -  t(2ne/4)}  , 

where 

Uj  -  j2®/2  .for  j=0,  1,  ...  . 

Since 

l  *>  and  l  2b/2(1  -  4(2ne/4)}  <  ®  , 

b-1  2m  o-l 

it  follows  from  (2.21),  (2.22),  and  (2.23)  that 

(2.24)  P{ | S* (u)  -  W(u) |  <  u*+E,  u  e  [L,T]}  -*»  1 

as  L-*®,  uniforoly  in  B-experiaents.  Another  application  of  Kolaogorov's 
inequality  to  S*(u)  for  u£L  finishes  the  proof. 

In  terns  of  the  observable  processes  i(t,g)  and  -£( t,$). 

Theorem  2.2  says  that,  as  K-*®, 

(2.25)  P[|i(t,B)  -  W{-’£(t,g)}|  <  K  +  {-i(t,g)),*+e,  V  t  >  0]  ♦  1  , 

uniformly  in  B-experiments .  Thus  l(t,g)  looks  like  a  standard 
Brownian  motion  in  the  time  scale  determined  by  -£(t,0).  This  result 
can  be  used  to  construct  sequential  tests  of  known  size  of  the 


hypothesis  HQ:  0  *  0Q.  However,  the  theorem  in  the  next  section  is 
needed  if  we  wish  to  calculate  the  power  of  a  test  or  to  sequentially 
estimate  0. 

2.3.  Approximation  of  the  Maximum  Partial  Likelihood  Estimator  Process 

by  a  Brownian  Motion 

Let  0(t)  be  the  maximum  partial  likelihood  estimator  of  0  at 
time  t.  Let  W(»)  be  the  Brownian  motion  of  Proposition  2.1. 

Theorem  2.5.  Let  e  >  0.  Then,  as  L-*«>, 

p{|  [-5£{t,  3(t))3  •  {b (t)  -  0}  -  w[-‘£{t,  g(t)}]  |  <  [-lilt,  0(t)}]*+e 

for  -lit,  0(t)}  >  L}  ■+■  1 
uniformly  in  B-experiments. 

The  proof  of  Theorem  2.3  has  been  omitted  since  it  is  very  similar 
to  Section  3.4  of  the  next  chapter. 

Theorem  2.3  says  that  if  0(t)  -  0  is  normalized  by  multiplica¬ 
tion  by  [-'£{t,  0(t)} ] ,  the  resulting  process  looks  like  a  standard 
Brownian  motion  in  the  time  scale  determined  by  the  estimated  observed 
Fisher  information  -£{t,  0(t)}.  The  observable  process  [-£{t,  0(t)] 

•  0(t)  looks  like  a  Brownian  motion  with  drift  0  in  the  -£{t,  0(t)} 
time  scale.  Thus,  the  problem  of  sequentially  estimating  0  or  test¬ 
ing  Hq:  0  *  0q  has  been  shown  to  be  asymptotically  equivalent  to  the 
problem  of  sequentially  estimating  or  testing  the  drift  of  a  Brownian 
motion.  There  are  two  remaining  difficulties.  The  first  is  that 
-£{»,  @(»)}  may  be  smaller  than  we  woftld  like,  so  that  the  experiment 
stops  giving  additional  information  before  we  choose  to  stop  sampling. 


31 


In  teras  of  the  Brownian  motion  problem,  this  would  mean  that  we  might 
not  be  able  to  watch  the  Brownian  motion  for  as  long  as  we  would  like 
to.  However,  if  -*£{“»,  §(»)}  <  “,  then  (2.20d)  of  Proposition  2.1 
suggests  that  the  behavior  of 

4(t,6) 


and  of 


l-‘i(t,  $(t)}]  {$  (t)  -  $} 

in  information  time  is  approximately  that  of  a  Brownian  motion  until  a 
stopping  time.  Thus,  the  information  time  at  which  the  experiment  ends 
does  not  somehow  anticipate  what  the  score  process  or  the  maximum  par¬ 
tial  likelihood  estimator  process  would  have  done  if  they  had  been 
allowed  to  continue. 

The  other  difficulty  results  from  the  fact  that  -£{t,  (?  (t)} 
may  grow  very  slowly  when  |sj  is  very  large.  To  be  specific,  sup¬ 
pose  we  are  to  compare  two  treatments  for  a  disease,  so  that  z-  is 
either  0  or  1,  depending  on  the  treatment  group.  We  wish  to  con¬ 
duct  a  sequential  test  of  B«0  by  observing  the  process  [-i(t,  8(t)}] 
•  B(t)  in  the  -2{t,  0(t)}  time  scale.  If  we  intend  to  use  the 
Brownian  motion  approximation  to  compute  the  power  function  and 
expected  information  time  until  stopping  for  our  sequential  test, 
then  Theorem  2.3  says  that  we  should  continue  the  trial  at  least 
until  -iit,  g(t)}  >  L  for  a  value  of  L  chosen  to  make  the  Brownian 
motion  approximation  good.  However,  if  £  is  very  large,  we  may 
see  many  deaths  in  treatment  group  1  before  we  see  any  deaths  in 


32 


treatment  group  0.  If  all  observed  deaths  are  from  treatment  group  1, 
then  8(t)  ■  »  and  -£{t,  B(t)}  *  0.  Even  if  deaths  have  been 
observed  in  both  groups,  -£{t,  S(t)}  may  be  small  long  after  one 
treatment  has  shown  itself  to  be  much  better  than  the  other.  To 
avoid  this  problem,  one  could  observe  the  8=0  score  process 
i(t,0)  in  the  -£(t,0)  time  scale  until  -£{t,  8(t)}  >_  L.  If 
|&(t,0)|  exceeds  some  number  M  before  -£{t,  §(t)}  _>  L,  then 
we  stop  and  reject  HQ:  8=0.  Otherwise,  we  begin  to  observe 
[-£{t,  B(t)}]  8(t)  in  the  -£{ t,  6(t)}  time  scale  when  -£{t,  8(t)} 
exceeds  L.  If  M  is  reasonably  large,  then  the  probability 
of  stopping  before  -£{t,  0(t)}  L  is  small  for  moderate  values 
of  8>  and  the  power  function  and  expected  information  time  until 
stopping  can  be  found  from  the  Brownian  motion  approximation.  For 
large  values  of  |b|,  this  procedure  does  not  force  us  to  continue 
the  medical  trial  long  after  common  sense  tells  us  to  stop. 


CHAPTER  III 


TRIALS  WITH  STAGGERED  ENTRY  AND  INDEPENDENT 
IDENTICALLY  DISTRIBUTED  PATIENTS 

In  Chapter  II  it  was  seen  from 
(2.7)  l(t,0)  -  l  f  (z.(s)  -  y (s) }  dM.(s) 

i  Jo  1  1 

that  £(t,3)  was  a  local  martingale.  In  the  case  of  staggered 
entry  of  patients,  we  shall  obtain  a  similar  equation 

(3.1)  *(t,B)  s  l  [  (z. (s-y.)  -  y(t,  s-y.)}  dM.  (s)  , 

iJ[0,t]  11 

where  yA  is  the  entry  time  of  the  i-th  patient  (cf.  equation  3.22)). 
The  M^'s  are  still  local  martingales,  but  now  the  integrands  are 
functions  of  t  for  each  s.  Let  R(t,s)  be  the  set  of  patients 
who,  by  calendar  time  t,  have  been  under  observation  for  s  time 
units  after  entry.  Then  y(t,s)  is  the  weighted  average  of  covar¬ 
iates  of  patients  in  R(t,s).  One  can  still  use  the  martingale  pro¬ 
perty  of  KL  to  study  i(t,B)  for  a  single,  fixed  value  of  t,  but 
the  &(t,B)  process  itself  is  no  longer  a  martingale  in  general. 

This  chapter  extends  the  results  of  Chapter  II  to  the  case  of 
staggered  entry.  The  basic  idea  is  as  follows.  If  we  could  replace 
the  random  function  y(t,  s-y^)  appearing  in  (3.1)  with  a  nonrandom 
function  y(s-y^)  not  depending  on  t,  then  the  resulting  integral 
would  be  a  martingale  in  t.  In  a  time  scale  determined  its  pre¬ 
dictable  quadratic  variation,  this  martingale  could  be  approximated 

* 

by  a  Brownian  motion  as  was  done  in  Chapter  II,  and  one  would  hope 

that  this  predictable  quadratic  variation  process  is  well  approximated 
•• 

by  -t(t,B).  In  order  to  guarantee  that  y(t,s)  is  sufficiently  well 


approximated  by  a  deterministic  function  y(s),  we  will  assume  that 
the  (one-dimensional)  covariate  processes  and  the  censoring  times  of 
patients  are  jointly  i.i.d.,  independently  of  entry  times.  Then  the 
patients  in  R(t,s)  have  conditionally  i.i.d.  covariates,  where  this 
conditional  distribution  does  not  depend  on  t.  If  the  cardinality 
of  R(t,s)  is  large,  then  y(t,s)  is  approximately  nonrandom  in  the 
sense  that  its  distribution  is  very  concentrated  around  y(s).  In 
order  to  ensure  that  the  risk  sets  R(t,s)  are  large  "most  of  the 
time",  it  will  be  necessary  to  abandon  the  very  general  counting  pro¬ 
cess  and  censoring  process  formulation  of  Chapter  II  and  to  return  to 
the  more  standard  setting  of  "deaths"  and  right  censoring. 

3.1.  Notation  and  Formulation  of  the  Model 

Suppose  we  are  given  a  possibly  infinite  sequence  of  entry  times 
0  <_  yj  <_  y2  <_  ...  such  that  any  interval  [0,t]  contains  only  finitely 
many  y^s.  To  the  patient  i  entering  at  time  y^^  is  associated  a 
random  triple  (z^*),  c^,  h^}.  The  one-dimensional  covariate  process 
z^(*)  defined  on  [0,®)  is  assumed  left-continuous  and  bounded  in 
absolute  value  by  a  fixed  constant  B.  The  possibly  infinite  random 
variable  c^  is  the  time  after  entry  of  censoring,  and  the  random 
variable  h^  is  the  amount  of  "accumulated  hazard"  which  patient  i 
can  tolerate  before  dying.  Also  given  is  a  fixed  baseline  cumulative 
hazard  function  A(s),  s  ^  0,  which  satisfies  A(0)  =  0,  is  nondecreas¬ 
ing,  and  is  continuous  on  [0,  t°°),  where  t°°  =  inf(t:  A(t)  =  »}.  Fix 
3  e  R  and  define 


rt  Bz. (s) 

(3.2)  x.  *  inf{t:  e  dA(s)  >  h.)  . 

1  Jq  1 

The  random  variable  is  the  survival  time  of  patient  i  after 

entry  into  the  trial.  The  i-th  patient  is  on  test  during  the  time 
interval  [y. ,  y.  +  x.  a  c.].  If  s  <  x.  a  c. ,  then  at  time  y.  +  s 

i  ii  —  i  i  'i 

we  observe  the  covariate  value  z.(s).  At  time  y.  +  x.  a  c.,  we 

i  i  i  i 

observe  the  death  of  the  i-th  patient  if  x^  £  and  otherwise 

observe  that  he  is  censored.  At  any  time  t  there  is  in  effect  a 

second  censoring  variable  (t  -  y.^+  in  the  sense  that  the  time  on 

test  of  patient  i  prior  to  t  is  x.  a  c.  a  (t-y^*.  We  shall 

refer  to  x.,  c^  and  (t  -  y^)  +  as  "age"  variables  -  the  age  of 
the  i-th  patient  at  death,  at  censoring,  and  at  time  t,  respectively. 

Our  stochastic  assumptions  are  as  follows.  The  pairs  (z^(»)»  <-} 
are  independent  and  identically  distributed,  independently  of 
the  arrival  times.  The  "tolerance  to  hazard"  random  variables  h.^ 
are  exponentially  distributed  with  parameter  1,  independently  of 
each  other  and  of  everything  else.  It  will  also  be  assumed  that 
there  exist  positive  numbers  6  and  n  such  that 

(3.3)  A  (6)  >  0 

and 

(3.4)  var{z1(s) |x^  A  >  s}  >  n  for  0  £  s  £  6  . 

A  medical  trial  as  described  above  will  be  referred  to  as  a  (B,5,n)- 
experiment.  A  medical  trial  which  satisfies  all  of  the  above  condi¬ 
tions  except  possibly  (3.3)  and  (3.4)  will  be  referred  to  as  a 


36 


B-experiment .  Some  of  the  asymptotic  results  of  this  chapter  will  hold 
uniformly  in  B-experiments,  and  all  will  hold  uniformly  in  (B,6,n)- 
experiments. 

All  probabilities  and  expectations  should  be  considered  as  con¬ 
ditional,  given  ylf  y2>  ...  . 

It  is  convenient  to  introduce  the  notation 


(3.5) 


N,(t,s)  -  I 


tf  Xi-Ci’  xi~s^ 


to  indicate  that  the  i-th  patient  arrived  and  died  before  time  t,  and 
that  he  was  uncensored  and  of  age  <_  s  at  the  time  of  death.  We  also 
define  the  set  of  patients  at  risk  at  time  t  and  age  s  by 


(3.6)  R(t,s)  =  {i:  y^  £  t-s,  x.  a  ci  >  s}  . 

With  this  notation  Cox's  (1975)  log  partial  likelihood  for  $  can  be 
expressed  by 

_  f  _  Bz.(s) 

(3.7)  i(t.B)  -  l  [Bz.(s)  -  log{  l  e  3  }]  N. (t,ds)  . 

iJ[0,t]  jcR(t.s) 

Differentiating  (3.7)  with  respect  to  8  gives  the  score  process 

(3.8)  £(t,B)  =  l  f  {z4(s)  -  0(t,s)}  N.  (t,ds) 

i  [0,t]  1  1 

where 


ff(t 


|3z. 

I  z.(s)  e  3 

S)  =  _ _ 

o-r  rB~\ 


l  • 

jcR(t.s) 


fiZj  (s) 
Bz . (s) 


3 


(3.9) 


re 


Minus  the  second  derivative  of  (3.7)  is  the  observed  Fisher  informa¬ 
tion  process 


(3.10)  -2(t,S)  *  l  52(t,s)  N.(t,ds)  , 

i  (0,t]  1 


where 

„  2  Bz,(s) 

l  {z-j  (s)  -  y(t,s)}z  e  J 

2  jeR(t.s)  _ _ 

(3.11)  52(t,s)  * 

l  «  1 

jcR(t.s) 


l 


2  Bz. (s) 
zf(s)  e  J 

3 


_ _ 

“ ^Z.(s) 

l  •  3 

jcR(t,s) 


y2(t,s) 


The  maximum  partial  likelihood  estimator  of  0  is  the  solution 
B  =  B(t)  of 

i(t,B)  =  o  . 

Tests  of  the  hypothesis  H^:  (3  *  0Q  can  be  based  on  6  or  directly 
on  i(t,B0).  The  usual  Taylor  series  approximation 

(3.12)  0  »  i(t,|)  -  l(t,B)  ♦  (S-B)  i(t.  )  +  ... 

indicates  that  the  asymptotic  behavior  of  B  is  intimately  associated 
with  that  of  &(t,$),  which  we  now  consider. 

Define 

f  B*. GO 

(3.13)  A. (s)  *  e  1  dA(u)  . 

I0,s] 


38 


c 


% 


w 


If 


« 


« 


For  s  >  0,  let  F  be  the  o-algebra  generated  by  y.,  c., 

{zi(u)>  u>  0),  I{x  <s}»  and  1{x.<s}‘  i=1'  2»  •••  *  Then  since 

.  A.(s)-A.  (s+A) 

(3.14)  PCx.  e  (s,  s+A]|Fs)  *  (1  -  e  }  I{x  >g) 

*  [{Ai(s+A)-Xi(s)}  +  o(Ai(s+A)-Ai(s)}]  I{x  >s} 

it  follows  that 


(3.15) 


I{xi£S> 


Ai(xi  A  s) 


is  an  Fs-mart ingale  in  s  >_  0.  Fix  t  >_  0.  Since  c ^  a  (t-y^)+  is 

A 

an  Fg  stopping  time. 


(3.16) 


‘{x.<s 


ci 


a  (t-y.)+>  _  Ai{xi  A  5  A  ci  A  Ct-Xi)*) 


A  A 

is  also  an  F  -martingale.  Let  F  be  the  sub-o-algebra  of  F  con- 
S  1 1  s  s 

taining  events  which  have  been  observed  by  time  t  and  which  are  of 
age  <  s,  i.e.  Ft  s  is  the  o-algebra  generated  by  I{y  <t}>  Y*  1{y.<t}» 

1{xi<js  a  a  (t-yi)4}’  xi  I(xi£s  a  c.  a  (t-yi)+)'  I(ci£s  a  xa  a  (t-yi)+) 

Cl  ‘{^<5  A  A  Ct-yj)*)’  a"d  t2l(U)’  “  '  CO.  s  A  x.  A  Cj  A  (t-y.)*]). 

i*l,  2,  ...  .  Since  (3.16)  is  adapted  to  F  ,  it  follows  that  (3.16) 

t ,  s 

is  also  an  -martingale  in  s  >  0  for  fixed  t. 

Let  Ai(t,ds)  =  I{i£R(t,s))Ai(ds)  *  111611 


(3.17) 


Ai  (t,s) 


Ai(s  A  Xi 


A  ci  A  (t-yA)+} 


and  the  martingale  of  (3.16)  can  be  written  as 


39 


(3.18) 


N.(t.s)  -  A.(t.s) 


Define 


(3.19)  i(t,s,0)  »  l  [  (z4(u)  -  y(t,u)}(N. (t,du)  -  A. (t ,du)} 


It  follows  from  (3.8)  and  simple  algebra  that 


i(t,t,0)  -  l(t,B)  . 


Moreover,  the  stochastic  integral  in  (3.19)  inherits  the  martingale 
property  of  (3.18)  since  the  integrand  is  bounded  and  F  -predictable. 

Vg  S 

Thus,  for  each  fixed  t. 


(3.20) 


{i(t,s,B),  Ft  s) 


is  a  martingale  in  s  (Gill  (1980),  p.  10  or  Liptser  and  Shiryayev 
(1978),  p.  268). 

This  martingale  property  in  s  of  i(t,s,B)  is  the  basis  of 
the  analysis  of  the  asymptotic  nonulity  of  i(t,0)  ■  i(t,t,0)  at 
one  fixed  point  in  time  found  in  Gill  (1980)  and  in  Andersen  and  Gill 
(1981).  The  idea  is  that  if  the  observations  up  to  time  t  are 
viewed  in  "age  time"  with  patient  i  being  censored  at  age  time 
c^  A  (t-y^)+,  then  the  age  time  process  is  equivalent  tc  an  experi¬ 
ment  with  simultaneous  entry.  However,  this  approach  does  not  work 
if  one  is  interested  in  the  joint  distribution  of  i(t,0)  at  different 
values  of  t  since  i(t,0)  is  not  in  general  a  martingale. 

Let  NA(t)  ■  Ni(t,t),  A^(t)  -  Aj(t,t),  and  ■  F^  Then 
N. (t)  is  an  indicator  for  the  event  that  patient  i  was  observed 


I 


to  die  before  time  t,  and  A^(t)  is  the  accumulated  hazard  to  which 

patient  i  has  been  exposed  by  time  t  while  under  observation.  An 

argument  similar  to  the  one  showing  that  (3.18)  is  an  F  -martingale 

t  f  s 

shows  that 


(3.21) 


(N.(t)  -  A.(t),  Ft) 


is  a  martingale  in  t. 

By  a  change  of  variable  in  (3.19), 


(3.22)  l(t,B)  =  l  f  {z  (s-y.)  -  y  (t,  s-y. )}{N. (ds)  -  A. 

i  J[0,t]  11  11  1 


(ds)} 


With  the  notation 


M.(s)  =  N.(s)  -  Ai(s)  , 


(3.22)  is  seen  to  be  the  same  as  (3.1).  As  was  mentioned  at  the  begin 
ning  of  this  chapter,  the  plan  is  to  approximate  y(t,s)  by  a  deter¬ 
ministic  function  y(s).  A  natural  candidate  for  y(s)  is  given  by 

3z.(s) 

E(z,(s)  e  ;  x,  a  c,  >  s} 


E{e  ;  Xj  a  Cj  >  s} 


since  an  informal  law  of  large  numbers  argument  suggests  that  y(t,s) 
should  be  close  to  y(s)  when  R(t,s)  is  large.  Again,  0/0  is  to 
be  interpreted  as  0  in  (3.9),  (3.11),  (3.23),  and  elsewhere.  Let 

(3.24)  Q(t)  »  l  f  (z. (s-y,)  -  y(s-y.)}{N. (ds)  -  A.(ds)}  . 

i  ' [0,t]  11  11  1 


Since  the  integrands  in  (3.24)  are  bounded  and  F  -predictable, 

(3.25)  (Q(t),  Ft) 

is  a  martingale  in  t.  Note  that  Q(t)  can  also  be  written  as 

(3.26)  Q(t)  »  l  [  (z.(s)  -  y(s)}{N. (t,ds)  -  A, (t,ds)}  . 

iJ[0,t]  1  1  i 

Define 

(3.27)  N(t)  »  l  N.(t),  A(t)  -  l  A. (t) ,  and  D(t)  *  EN(t)  . 

i  1  i  1 

Thus,  N(t)  is  the  number  of  deaths  observed  by  time  t,  A(t)  is  the 
accumulated  hazard  acquired  by  all  patients  while  on  test  before  time 
t,  and  D(t)  is  the  expected  number  of  deaths  observed  before  time 
t.  Also  define 

(3.28)  N(t,s)  -  l  N^t.s)  ,  A(t,s)  «  J  A.(t,s) 


(3.29)  r(t)  »  £(t,0)  -  Q(t) 

*  l  f  (y(t,s)  -  y(s)}{N, (t,ds)  -  A. (t,ds)) 

i  [0,t]  1  i 

■  I  (y(t,s)  -  y(s)}{N(t,ds)  -  A(t,ds)}  . 

J[0,t] 

The  goal  of  this  chapter  is  to  show  that  i(t,$)  and 
[4(t,  @(t)}]  •  (8(t)  -  0}  can  be  well  approximated  by  W[-*£{t,  B(t)}], 


where  W(»)  is  a  standard  Brownian  motion.  An  argument  similar  to 
that  of  Chapter  II  shows  that  the  martingale  Q(t)  is  well  approxi¬ 
mated  by  W{^Q^(t)},  where  W(«)  is  a  Brownian  motion  and 

(3.30)  <Q>(t)  =  l  f  (z.(s)  -  y(s)}2  A. (t,ds) 

iJ[0,t]  1  1 

is  the  predictable  quadratic  variation  process  of  Q(t).  To  apply 
this  result  to  £(t,8),  it  is  necessary  to  show  that  r(t)  given  in 
(3.29)  is  small  and  that  -’£(t,0)  is  close  to  ^Q^(t).  Some  Taylor 
series  arguments  show  that  S,(t,8)  is  close  to  [-£{t,  0(t)}] 

•  (B(t)  -  B>. 

Let  0  <  e  <  y^-.  The  final  theorem  will  follow  from  Propositions 
3.1  through  3.6,  all  of  which  hold  uniformly  in  (B,6,n)-  experiments, 
and  for  0  in  compact  intervals.  In  addition.  Propositions  3.1,  3.2, 
and  3.6  hold  uniformly  in  B-experiments. 

Proposition  3.1.  As  K-*0, 

P{  |r(t)  i  <  K  +  Oft)*1"6,  t  >  0}  -*•  1  , 

uniformly  in  B-experiments. 

Proposition  3.2.  As  K-*», 

P{|i(t,8)  +  <Q>(t) I  <  K  +  D(t)1_2E,  t  >  0}  -  1  , 

uniformly  in  B-experiments. 

Proposition  3.3.  There  exists  a  >  0  such  that,  as  K-"», 


P{(Q)(t)  ♦  K  >  aD(t),  t  >  0}  -*•  1  . 


Proposition  3.4.  (Consistency  of  B(t).)  As  L-*», 

Pl|6 Ct)  -  $|  <  {-i(t,B)}e_Ss  for  4(t,8)  >  L]  -  1  . 
Proposition  3.5.  As  L-*“, 

P[|i(t,  8(t)H8(t)  -  3)  ♦  i(t,3)|  <  {-’i(t,B)}3e  for  -i(t,B)  >  L]  -*■  1  . 

Proposition  3.6.  There  exists  a  standard  Brownian  motion  W(») 
such  that 

P{|Q(t)  -  W((Q)(t)}|  <  K  ♦  (Q)(t)^+e  for  t  >  0}  -►  1  as  . 

This  holds  uniformly  in  B- experiments. 

Finally,  we  get  to  the  main  theorem. 

Theorem  3.7.  There  exists  a  standard  Brownian  motion  W(*)  such 
that,  as  K-«», 

p[|i(t,3)  -  w{-'i(t,8)}|  <  K  +  (-ict.B)}5*^  for  all  t  >  0]  ♦  1  , 
and,  as  L-*», 

P[|  [-i{t,  B(t)}]  •  (B(t)  -  B>  -  W[-’£{t,  i(t)}]| 

<  I-i{t,  B (t)} ]S*"e  for  -lit,  B(t)}  >  L]  -*■  1  . 

Furthermore,  the  convergence  is  uniform  in  (B,5,n)-experiments, 
and  for  3  in  compact  intervals. 


5 

s 

1 

“  - 


4 

r 


3.2.  Approximation  of  £(t,B)  by  the  Martingale  Qft) 

This  section  will  show  that 

r(t)  *  l(t, 8)  -  Q(t) 

is  uniformly  small  in  the  sense  of  Proposition  3.1.  The  martingale 
property  in  s  of  N^t.s)  -  A-^t.s)  for  fixed  t  is  used  to  show 
that 

E  r2 (t)  -  0{3  +  log  D(t)}  , 

uniformly  in  t  c  [0,*].  The  Chebyshev  inequality  is  then  applied  to 
show  that 


|r(tk)|  <  K  ♦  *sD(tk),*"C 

holds  with  high  probability  for  all  tk's  in  a  certain  sequence  which 
increases  to  +“>.  Crude  estimates  which  show  that  r(t)  does  not 
vary  too  much  between  tk's  finish  the  proof  of  Proposition  3.1. 

Lemma  3.8.  For  all  t  €  10,«], 

E  r2 (t)  <  4B2  e4B'B'  {3  ♦  log  D(t)}  . 

Proof.  From  fundamental  properties  of  stochastic  integrals, 

(3.31)  E  r2 (t)  -  EtX  f  (PCt.s)  -  y(s)}2  N.(t,ds)]  . 

i  [0,t]  i 

By  considering  the  i-th  term  and  conditioning  on  x^,  Rft.x^),  and 
the  event  {xi  <  ci  a  (t-y^},  we  obtain 


45 


(3.32)  E  r(t)  «  E{£  N.(t,x.)  E[{y(t,x.)  -  y(x.)}2|  x^  Rft.x^, 


S 


% 


xi  1  (t-y^*  a  c.,]} 


<.“!•'  4l  7^4  {k<v  Z  W  .W 

\i  |R(t,x.)r  Ll°  j€R(t,x.)  3 

8z.(s) 

where  y^(s)  *  E(Zj(s)  e  lxj  A  Cj  s)  for  v»0  and  1,  and  |a| 

denotes  the  cardinality  of  the  set  A.  Let  R?(t,s)  =  R(t,s)  -  (i), 

and  observe  that  given  x^  ^  a  (t-yi)  +  ,  and  R^t.x^  *  {j  ,  .... 

z.  (x, ),  . ...  z.  (x.)  are  independent  and  identically  distributed 
31  1  3m  1 

with 


l  Bzi  (xi3 


xt,  R?(t,xi),  X.  <  c.  A  (t-yi)+|  *  yv(*i) 


for  v=0  or  1.  Hence  except  for  terms  involving  i,  the  conditional 
expectations  on  the  right-hand  side  of  (3.32)  involve  the  square  of  a 
sum  of  i.i.d.  random  variables  having  mean  0  and  variance  less  than 
4B2  e2BlBl.  Thus 

£4B2  e4B^  E{3  +  log  N(t)} 

<  4B2  e4B^l  (3  +  log  D(t)}  . 


46 


The  second  to  last  inequality  follows  from 

N  1  1 

I  (7  +  +  lo8  N}, 

i=l  1  1 

and  the  final  inequality  follows  from  the  Jensen  inequality.  This 
finishes  the  proof  of  Lemma  3.8. 

Let  0  <  e  <  yjr,  and  define  0  =  tg  tj  <_  . . .  £  »  by 

(3.34)  tk  =  inf{t:  D(t)  =  k1+3e}  . 

1+3e 

Thus,  tjc  *  ®  if  D(®)  <_  k  .By  the  Chebyshev  inequality  and 
Lemma  3.8, 


(3.35) 


,  ,  1  u  c  3  +  log  D(t,  ) 

P{|r(tk)|  >  j  D*  e(tk)>  <  const.  - U2e~~ 


D<V 


<  const. 


l+e-6e 


for  tk  <  «.  Since  l+e-6e“  >  1,  the  following  lemma  now  follows 
easily. 

Lemma  3.9.  For  0  <  e  <  and  (tk>  as  above. 


lim  P{ir(t.)| 
ir*o° 


<  n2  ♦  \ 


for  k  >  n)  -►  1  . 


The  proof  of  Proposition  3.1  Is  completed  by  Lemmas  3.10  and 
3.11,  which  show  that  r(t)  does  not  vary  too  much  between  0  and 
tn  and  between  t*  and 


for  k  >  n,  respectively. 


Lemma  3.10.  P{  max  |r(t)|  <  n  }  -*■  1  as  n-*«  . 

0<t<t 
- n 

Proof.  Note  that  if  8(p)  is  a  Bernoulli  variable  with  para¬ 
meter  p,  then  for  p  <_  h,  B(p)  is  stochastically  smaller  than  a 
Poisson  variable  with  parameter  2p,  say  ?(2p).  Hence  for  all 
0  £  p  <_  1,  8(p)  is  stochastically  smaller  than  2p  +  P(2p).  Since 
N(tn)  is  a  sum  of  independent  Bernoulli  variables  with  E  N(tR)  <_  n1+3e, 

IxTc  liZp 

it  follows  that  N(tn)  is  stochastically  less  than  2n  +  P(2n  ). 

By  the  central  limit  theorem, 

(3.36)  P(N(tn)  >  n1+4e}  -*-0  as  n^»  . 

WCg 

On  (A(tn)  n  },  N(tn)  is  stochastically  larger  than  a  Poisson 

l+5e 

random  variable  with  mean  n  and  hence 

(3.37)  P(N(tn)  <  n1+4e,  A(tn)  >  n1+5e)  <  P{P(n1+5e)  <  n1+4e)  -►  0  as  n*» 

again  by  the  central  limit  theorem. 

By  (3.36)  and  (3.37), 

(3.38)  P(A(tn)  >  n1+5E}  -*-0  as  ir*»  . 

By  (3.29), 

(3.39)  max  | r (t ) j  <  2B(N(t  )  ♦  A(t  )}  . 

0<t<t  -  n  n 

—  —  n 

Together,  (3.36),  (3.38),  and  (3.39)  imply  Lemma  3.10. 


1 


Lemma  3.11. 


P{  max  |r(t)  -  r(t,)j  £  y  D'5"e(t.)  for  all  k  >  n)  -*•  1 
t  <t<t  K  1  K 

rk-  *k+l 


as  n-*». 


Proof.  Let  =  N(t^)  -  N(t^ _j)  be  the  number  of  deaths 
observed  in  (t^  j,  t^].  By  (3.34) 


(3.40)  E  Dk  £  kU3e  -  (k-l)1+3e  <  (1  +  3e)k3e  . 


is  a  sum  of  independent  Bernoulli  variables,  so  that  by  the  argu¬ 
ment  in  the  proof  of  Lemma  3.10,  is  stochastically  smaller  than 
3r  3c 

3k  +  P(3k  ).  By  easy  large  deviation  estimates  (see  Feller  (1971), 
p.  549,  Theorem  1), 


(3.41)  P(Dk  <  k7e/2  for  all  k  >  n)  1  as  n-*»  . 


Let  =  A(tj{)  “  A(tk  j)  be  the  hazard  accumulated  by  patients 

while  under  observation  in  (t^  j ] .  On  (H^  £  k  },  is  stochas- 

4r 

tically  larger  than  a  Poisson  random  variable  with  mean  k  and  hence 


(3.42) 


P(Dr  <  k'e'*,  £  k^)  £  P{?(k4e)  £  k7e/2} 


By  (3.41),  (3.42),  and  another  easy  large  deviations  estimate. 


(3.43) 


PtH^  <  k4E  for  all  k  £  n}  -*•  1  as  n-*» 


Let  tfc  £  t  £  tk+1.  By  (3.29), 


(3.44)  r(t)  -  r(t.)  =  [  (y(t,s)  -  y(s)}(N(t,ds)  -  A(t,ds) 

k  J(0,t] 

-  N(tk,  ds)  +  A(tk,  ds)} 

+  [  {y(t,s)  -  y(tk,s)}{N(tk,  ds)  -  A(tk,  ds)}  . 

1  to-'k1 

By  assumption,  the  z^ (• )  are  bounded  by  B  and  hence  y(t.s)  and 
y(s)  are  bounded  by  B.  It  follows  that  the  first  term  is  dominated 
by  2B{Dk+1  +  Hk+1)  uniformly  in  tk  <  t  <  tk+1*  By  (3.41)  and 
(3.43)  it  suffices  to  consider  the  second  integral  in  (3.44).  Let 

(s) 

(3.45)  m(t,s)  =  l  e 

jeR(t.s) 

and  observe  that 


(3.46) 


A(t,ds)  *  m(t,s)  dA(s)  . 


We  find  fTom  (3.9)  and  some  algebra  that  uniformly  in  tk  <  t  <  tk+1. 


!m(t.  ,,  s)  -  m(tk,  s)) 
- ™(tk+1,  s) - } 


Hence  by  (3.44)  and  (3.47)  it  suffices  to  show 


(3.48)  P<For  all  k  >  n,  f 

(  "  J(0,t.] 


m(t,  s)  -  m(t.,  s)  4 

l(ttll,~V)  —  "<V4s>  « k 


as  tr*»  , 


and 


1 


(3.49)  P<For  all  k  >  n,  f 

(  [0,t. 


ra(tk+l’  s)  "  mftk» 

m(tk+r  S) 


A(tk,  ds)  <  k 


4el 


-►I  as  n-*»  . 


From  (3.46)  and  some  cigebra  we  see  that  the  k-th  integral  in  (3.49) 
is  majorized  by  H^j,  so  (3.49)  follows  from  (3.43). 

Now  consider  (3.48).  It  is  easily  seen  by  direct  calculation 

that 


Lk(s) 


'I 


[0,s] 


m(tk+l’  u)  “  m(tk’  u) 

"(Vl*  U) 


N(tk,  du)  -  ^N(tk+1,  s)  -  N(tk,  s)) 


is  a  supermartingale  for  0  _<  s  <_  tk>  which  changes  by  jumps  downward 
of  size  1  and  upward  of  size  at  most  equal  to  1.  Furthermore, 
N(tk+1»  V  -  N<V  V  1  Dk+r  so  by  (3.41),  to  prove  (3.48)  it 
suffices  to  show 


(3.50)  P(For  all  k  >  n,  Lk(tk)  <  j  k4e)  -*■  1  as  ir"» 


Let  SQ  =  0,  and  for  j=l,  2,  ...  let 


Sj  *  inf{s:  s  j,  Lk(s)  -  Lk(S^  j)  >  1  or  <  0}  , 


where  it  is  understood  that  inf  <t>  =  tk»  Obviously  -1  <_  Lk (S^. ) 

-  kk(Sj  j)  <_  2,  and  from  the  supermartingale  property  we  see  that 


on 


{sm  <  V 


E(L.  (S.)  -  L.  (S.  .)|Ft  s  )  <  0  . 
K  3  K  J_1  K+l '  j-1 


51 


.1* 


and  hence 


'I 

'"W 

% 

a 


(3.51) 


P{Sj  <  tk* 


W  -  VV^11^  1  ,s.  > 

J  J  k+1’  j-l 


It  follows  from  (3.51)  that  between  downward  jumps  the  total  increase 
of  L^(s)  is  stochastically  less  than  1+w,  where  P{w=m}  »  (j)m+1, 
m=0,  1,  ...  .  Since  the  total  number  of  downward  jumps  is  Dk+J, 
an  easy  large  deviation  estimate  gives 

(3.52)  P(Lk(tk)  >  j  k4e,  Dk+1  <  (k+l)7£/2}  =  o(exp(-k1/7)}  . 


Combining  (3.41)  and  (3.52)  yields  (3.50),  which  in  turn  completes 
the  proof  of  Proposition  3.1. 

Before  moving  on  to  the  other  propositions,  let  us  make  a  few 

observations  about  the  proof  just  completed.  The  estimate  in  Lemma 
2 

3.8  for  E  r  (t)  is  very  good  in  that  it  is  of  the  same  order  of 
2 

magnitude  as  E  r  (t)  itself,  when  there  is  no  censoring.  However, 
the  application  of  the  Chebyshev  inequality  and  the  addition  of  prob¬ 
abilities  of  "bad"  sets  in  the  proof  of  Lemma  3.9  is  extremely  crude. 
Only  the  fact  that  the  bound  of  Lemma  3.8  is  so  much  smaller  than  the 
bound  in  Proposition  3.1  makes  it  possible  for  such  heavy-handed 
methods  to  work.  The  proofs  of  Lemmas  3.10  and  3.11  make  use  of 
very  crude  large  deviation  methods,  though  the  sloppiness  here  does 
not  cost  us  much  more  than  was  already  lost  in  Lemma  3.9.  The  rea¬ 
son  why  such  rough  methods  were  used  is  that  it  is  very  difficult 
to  say  anything  useful  about  the  distrubiton  of  r(t)  as  a  stochastic 
process  in  t.  However,  it  seems  clear  that  the  r(t)  process  is 


52 


actually  much  smaller  than  is  claimed  by  Proposition  3.1.  In  fact, 
it  is  probably  o(De(t)}  for  any  e  >  0.  This  suggests  that  the 
error  committed  by  approximating  2,(t,Bg)  by  a  martingale  may  be 
quite  small  even  when  the  sample  sizes  are  moderate. 


3.3.  Approximation  of  ^Q^(t)  by  -i£(t,g) 

This  section  will  prove  Propositions  3.2  and  3.3.  Proposition 
3.2  shows  that  (<3)(t)  +  £(t,8)  is  o{D(t)1-2e).  The  proof  of  Pro¬ 
position  3.2  is  made  relatively  painless  by  the  observation  that  the 
techniques  used  in  the  proof  of  Proposition  3.1  apply  almost  without 
change.  I  am  sure  that  the  reader  who  has  come  this  far  will  be 
grateful  for  not  having  to  go  through  another  proof  as  technical  and 
complicated  as  that  of  Proposition  3.1. 

Proposition  3.2.  As  K-**> 

P{|2(t,0)  +  (q) (t) I  <  K  +  D(t)1_2e,  t  >  0}  -  1  , 


uniformly  in  B-experiments. 

Proof.  Note  from  (3.10)  and  (3.28)  that 

(3.53)  -’£(t,B)  =  f  a2(t,s)  N(t,ds) 

J[0,t] 

where 


(3.11) 


-2 . 


2  ezj(s) 

4mr—  - 

l  e  J 

jcR(t,s) 


53 


Now  apply  some  algebra  to  (3.30). 


(3.30)  <Q>(t)  =  l  [  {z  (s)  -  y(s)}2  A.(t,ds) 

iJ[0,t]  1  1 

*  l  I  (s)  -  y(t,s)  +  y (t , s)  -  y(s)}2  A. (t,ds) 

iJ[0,t]  1  1 

=  [  {{i(t,s)  -  y(s)}2  A(t,ds) 


+  I  [  {^4(3)  -  y(t,s)}2  A. (t,ds) 

i  [0,t]  1  1 


(y(t,s)  -  y(s)}2  A(t,ds) 

[0,t] 


♦  f  32(t,s)  A(t,ds)  . 


It  follows  from  some  additional  algebra  that 

(3.54)  i(t,B)  +  <Q)(t)  *  [  (y(t,s)  -  y(s)}2  A(t,ds) 

J [0,t] 

+  f  (y2(t,s)  -  y2(s)}{N(t,ds)  -  A(t,ds)} 

J[0,t] 


[0,t] 


2  (s) 


I  zf(s)  e  J 


-I 


[O.t] 


l 

jeR(t.s) 


CT  (s){N(t,ds)  -  A(t,ds)}  , 


(N(t,ds)  -  A(t,ds)} 


54 


where 


2  Bz1(s) 

E{zj(s)  e  ;  Xj  a  Cj  >  s} 


y2(s)  3  &71 s) 

E{e  ;  Xj  a  Cj  >  s} 


c2(s)  =  v2(s)  -  U2(s)  . 


The  first  three  terms  on  the  right-hand  side  of  (3.54)  can  be  esti¬ 
mated  by  techniques  similar  to  the  proof  of  Proposition  3.1.  The 
first  term  has  the  same  expectation  as 


^[0,t] 


(y(t,s)  -  y(s)r  N(t,ds)  , 


and  the  expectation  of  this  was  shown  in  the  proof  of  Lemma  3.8  to 
be  bounded  above  by  const. (3  ♦  log  D(t)}.  If  we  use  the  same  (t^) 
sequence  as  was  defined  in  (3.34),  then  the  Markov  inequality  implies 

(3.55)  P[f  (y(t,s)  -  u(s)}2  A(t.,ds)  >  iD1_2e(t.)] 

JIO,tv]  *  * 


<  const. 


3  ♦  log  D(tfc) 


<  const. 


o(tky 


3  »  (l+3e)log  k 
. l+e-6e2 


for  tk  <  «.  The  rest  follows  as  before. 

The  second  term  follows  directly  from  the  techniques  of  Proposi- 

2  2 

tion  3.1.  The  integrand  has  the  form  a  -b  ,  and  at  certain  points  in 


55 


the  proof  one  uses  the  factorization  a  -b  =  (a+b) (a-b)  and  the 
boundedness  of  (a+b) .  Bounding  the  third  term  requires  almost  no 
modifications  of  the  proof  of  Proposition  3.1. 

The  last  term  is  a  martingale  in  t,  and  the  Kolmogorov 
inequality  yields  an  easy  proof  that  this  term  is  oCD^ft)}. 

In  (3.54),  the  term  for  which  our  upper  bound  was  the  largest 
was  the  first  term.  However,  the  argument  that  was  given  at  the  end 
of  Section  3.2  for  why  r(t)  is  actually  much  smaller  than  D  (t) 

also  applies  here,  so  that  this  first  term  is  probably  smaller  than 

u 

D  (t)  and  possibly  much  smaller.  The  proof  of  Proposition  3.1  shows 
that  the  second  and  third  terms  of  (3.54)  are  smaller  than  D  (t), 
and  it  seems  clear  that  these  terms  are  ar  ually  much  smaller  and 
therefore  quite  negligible.  The  fourth  term,  however,  is  in  fact 
larger  than  o(D  (t)}.  Thus,  Proposition  3.2  probably  holds  if  we 
replace  D1  2e(t)  by  D*a"*’e  (t) ,  but  it  does  not  hold  with  D^t) . 

In  order  to  attain  our  goal  of  approximating  £(t,0)  by 
W{-i(t,0)},  where  W(«)  is  a  standard  Brownian  motion,  we  need  to 
know  that 

r(t)  =  £(t,0)  -  Q(t) 

and 

*(t,3)  +  <Q>(t) 

are  sufficiently  small  relative  to  (Q)(t) ,  whereas  what  we  have  from 
Propositions  3.1  and  3.2  is  that  these  processes  are  small  relative 
to  D(t).  Proposition  3.3  shows  that  the  \Q)(t)  process  is  of  the 
same  order  of  magnitude  as  the  function  D(t),  and  this  implies  what 


we  need.  The  rather  technical  assumption  made  in  (3.3)  and  (3.4) 
about  var(z^(s)jxj  a  Cj  >  s}  for  small  s  is  necessary  in  order 
to  guarantee  that  Proposition  3.3  holds  uniformly  in  configura¬ 
tions  of  entry  times.  It  seems  probable  that  Propositions  3.1 
and  3.2  hold  uniformly  in  B-experiments  with  D(t)  replaced  by 
<Q>(t)  or  -*£(t,8),  and  in  this  case,  assumptions  (3.3)  and  (3.4) 
and  Proposition  3.3  would  all  be  unnecessary.  However,  such  a 
change  in  Proposition  3.1  would  have  introduced  considerable  com¬ 
plications  into  an  already  complicated  proof.  In  any  case,  all 
but  the  most  suspicious  or  masochistic  readers  are  encouraged  to 
skip  the  rather  tedious  proof  of  Proposition  3.3  and  move  on  to 
the  more  exciting  and  enlightening  Section  3.4. 

Proposition  3.3.  There  exists  a  >  0  such  that,  as  K-*», 
P{(Q)(t)  +  K  >  aD(t) ,  t  >  0}  -»>  1  , 

uniformly  in  (B,6,ri)-experiments. 

Proof.  Assume  that  D(°°)  =  00  and  define  t^,  n*l,  2,  3,  ..., 
to  satisfy 

(3. 56)  D(tn)  =  2n  . 

Then  by  the  definition  (3.27)  of  D(t), 


5 


(3.57) 


\  E  Ni<V‘n>  ■  2"  • 

Let 

(3.58)  Y  *  1  -  exp{-e’^B  A(«))  , 

and  note  that  y  >  0  by  (3.3).  It  will  be  necessary  to  show  that 

(3.59)  E  N^t.S)  >  y  E  N^t.t)  for  all  t  >  0  . 

To  do  this,  recall  from  (3.5)  that 

<3-60>  «*•*>  -  '{xj  (t-yj)*}  • 

Condition  on  ^  and  (t-y^)+.  If  6  ^  c^  a  (t-y.^*,  then  N^t.fi) 
*  N^(t,t).  Otherwise,  N^t.d)  *  I{x  and 

(3.61)  E  Na (t,6)  =  P{xi  <  6} 

(by (3. 2))  *P(Ai(6)<h.} 

>  P{e’'B'B  A (6)  <  \) 
i  Y 

>  Y  E  Nt(t,t)  , 

so  that  (3.59)  follows  easily. 

It  follows  from  (3.57)  and  (3.59)  that 

(3.62)  l  E  N^.fi)  >  Y  2"  . 


58 


and  by  the  martingale  property  of  (3.18)  we  have 


(3.63) 


l  *  A.(tn,6)  >  Y  2° 


By  (3.4), 


(3.64)  P{|z. (s)-y(s)|  >  ~  x.  a  c.  >  s)  >  - ~  ,  for  0  <  s  <  6 

i  i  i  -  -  g+gB^ 


Thus,  we  can  find  p  >  0  such  that 


(3.65)  P{|  zi(s)-y(s)  |  >  p  x^  a  c^  ^  s)  ^  p  for  0  <_  s  £  6  . 


Again  using  the  martingale  property  of  (3.18),  we  get 

(3.66)  E  l  jo  1  { | z^ (s) -y (s) | >p)  Ni(Vds) 
r « 

"  E  l  Jo  ^UiCO-yCs)!^}  Ai(tn»ds) 


>e-l»lBlEf 

i  Jo 


L{|zi(s)-y(s)|^p}  I{x.Aci^s}  AfdsJ 


(by  Pubini)  >_  e 7  P{ | z. (s)-y (s) j  >  p  x.  a  c.  ^  s} 

I  '0 


(by  (3.65))  >  pe 


Ptxj^  a  _>  s)  A(ds) 

6A(t-y^)+ 

"^B  l  [  P(x,  a  c,  >  s)  A(ds) 

i  Jn  11 


.-2BB 


ffi 

I  H  A  (t  ,< 
i  Jo  1  n 


59 


>  pe‘2lB|B  l  E  A.(tn>fi) 
i 

(by  (3.63))  >  pe’2^8  Y  2n  . 

The  random  variable 

Vn  =  \  10  I^Ui(s)-y(s)|>p)  Ni(tn*ds) 

appearing  on  the  left-hand  side  of  (3.66)  is  a  sum  of  independent 
Bernoulli  variables,  so  that  its  variance  is  less  than  its  expectation. 

It  is  easy  to  show,  by  the  Chebyshev  inequality,  that  exceeds 

1  2 
2  E  vn  except  for  finitely  many  n,  almost  surely.  But  p  Vn  is 

dominated  by 

t 

if  (ZiCs)  -  y(s)}2  N.(t.ds) 
i  J0  1  n 

so  that,  except  for  finitely  many  n, 
t 

(3.67)  l  J  (z.(s)  -  y(s)}2  Nj (tn,ds)  >  \  p3  e_2|6lB  y  2n  . 

Using  Kolmogorov's  inequality  and  the  fact  that 

I  f  (Ms)  -  y(s)}2  (N.(t.ds)  -  A.  (t,ds)) 
i  1  1 

is  a  martingale  in  t  shows  that,  except  for  finitely  many  n, 

(3.68)  l  f  "  (z,(s)  -  y(s)}2  A  (t  ,ds)  >  |p3  e‘2l6lB  y  2n  . 

i  J0  1  "  4 


60 


Since  the  left  side  of  (3.68)  is  (Q)(tn)  and  the  right  side  is 
const.  D(*n)>  both  of  which  are  increasing  in  n,  the  Proposition 
follows  easily.  Minor  changes  take  care  of  the  D(<*>)  <  00  case. 

3.4.  Consistency  of  g(t)  and  Approximation  of  -it{t,B(t)H§(t)  -  8) 
by  l(t,B) 

The  proofs  of  Propositions  3.4  and  3.5  make  use  of  the  following 
lemma  concerning  the  third  derivative  of  the  log  partial  likelihood. 
Lemma  3,12. 

|ji'(t,S)|  <  2B{ -i (t,B))  , 

where  this  holds  for  all  values  of  8,  not  just  the  true  value. 

Proof.  By  some  algebra 

(3.69)  £(t,g)  =T  1  V,(t,s)  N.(t,ds)  , 

iJ[0,t]  3  1 

where 

(3.70) 

The  fact  that 


3 

l  (z. (s)  -  p(t,s)}  e  J 


l  • 

jeR(t,s) 


6z. (s) 


| Zj  (s)  -  y  (t,s) |  <  2B 

implies 

(3.71)  |y3(t,s)|  <  2B  52(t,s)  , 


61 


2 

where  a  (t,s)  is  given  by  (3.11).  The  lemma  now  follows  from  (3.10). 
Proposition  3.4.  As  L-*» 

Pl!8(t)  -  b0|  <  {-*£( t,B)}e-Js  for  4(t,B)  >  l]  1  , 

uniformly  in  (B,<5,n)-experiments. 

Proof.  The  proof  is  based  on  the  following  observation.  Since 
-&(t,  0+A)  is  nonnegative  for  all  t  0  and  A  e  F  ,  &(t,  B+A)  is 
nonincreasing  and  continuous  in  A.  Thus,  if  for  some  6  >  0  we 
have 

(3.72)  i(t,  e-6)  >  o  >  £(t,  B+S)  , 


then  $(t)  exists  and 

|8(t)  -  B|  <  6  . 

Furthermore,  if  (3.72)  holds,  then  £(t,  B+A)  is  strictly  decreasing 
and  0(t)  is  unique.  The  proof  proceeds  by  showing  that  £(t,8)  is 
small  compared  to  -i(t,B)»  so  that  (3.72)  holds  for  a  small  6. 

An  easy  argument  using  Kolmogorov's  inequality  shows 

(3.73)  P[|Q(t)|  <  ^<Q)(t))(e+1)/2  for  <Q)(t)  >  L]  ♦  1  as  l-*»  . 

By  Propositions  3.1,  3.2,  and  3.3  we  can  replace  Q(t)  by  £(t,B) 
and  (<i)(t)  by  -£(t,B)  in  (3.73)  to  get 

(3.74)  P[|l(t,B)|  £  {-i(t,B)}(e+1)/2  for  fc(t,B)  >  L]  -►  1  , 
uniformly  in  (B, 6, n) -experiments. 


62 


For  6  >  0,  there  exists  6*  with  6  >  6*  >  0  such  that 

(3.75)  l(t,  8+5)  =  SUt.B)  +  6  £(t,  6+6*)  . 

From  Lemma  3.12,  it  is  easy  to  show  that 

(3.76)  -£( t,  6+6*)  >  -£(t,6)  (1  -  2B6)  . 

By  (3.74)  and  (3.76), 

(3.77)  £(t,  6+6)  <  {-£(t,6)}Ce+1)/2  -  6{-£(t,6))  (1  -  2B6) 

holds  with  high  probability  for  all  6  >  0  and  for  all  t  satisfying 
-£(t,6)  >  L  when  L  is  sufficiently  large.  If  we  set  6  =  { -i (t , B) }e 
in  (3.77),  then  the  right  side  becomes 

(3.78)  {-£(t,6)}(e+1)/2  [1  -  {-£(t,6))e/2  (1  -  aBHUt.B))^}]  . 

If  L  is  sufficiently  large,  then  (3.78)  is  negative  whenever 
-£(t,6)  >  L.  This,  together  with  a  similar  argument  for  8.(t,  6-5), 
shows  that 

(3.79)  £[t,6  -  {-£(t,6)}e_Js]  >  0  >  l[t,B  +  {-£(t,6)}e-!s] 

holds  with  high  probability  for  all  t  satisfying  -£(t,6)  >  L  when 
L  is  sufficiently  large.  The  remarks  at  the  beginning  of  the  proof 
show  that  this  suffices. 


63 


Proposition  3.5.  As  L+00 


Pt|5£{t,  B(t)HS(t)  -  &}  +  i(t,B)j  <  {-Ut,B)}3e  for  -’£(t,B)  >  L]  -+  1 

uniformly  in  (B, 6, n) -experiments. 

Proof.  Note  that 

(3.80)  0  =  £{t,  0 (t ) }  =  l (t,B)  ♦  lit,  B+6)  (B(t)  -  B)  , 

where  8+6  is  between  B  and  0 (t ) .  Thus,  the  quantity  in  absolute 
value  signs  in  Proposition  3.5  equals 

(3.81)  [£{t,  0 (t)>  -  lit,  B+6) ]  CB(t)  -  B)  . 

By  Lemma  3.12,  the  absolute  value  of  (3.81)  is  less  than 

(3.82)  2B{-£ (t ,  B+6’))  (0 (t )  -  B>2  , 

where  B+6'  is  again  between  B  and  §(t).  Another  application  of 
Lemma  3.12  shows 

(3.83)  -lit,  B+6’)  <  e2B'6’'  (-’i(t,B)}  , 
so  that  (3.82)  is  less  than 

(3.94)  2B  e2B'5'l  {-2(t,B))  (8(t)  -  B)2  . 

Proposition  3.5  now  follows  from  Proposition  3.4. 

We  will  also  need  Lemma  3.13  when  we  prove  Theorem  3.7. 


4 


t 


Lemma  3.13.  As  K-*», 

p[f£{t,  6(t)}  -  <  k  +  (1+2e)/2,  t  >  o]  -  1  , 

uniformly  in  (B, 6, n) -experiments. 

Proof.  An  argument  like  the  proof  of  Proposition  3.5  shows  that 
as  L-*», 

(3.85)  P[|£{t,  6(t)}  -  £(t,6) 1  <  {-£(t,6)}(1+2e)/2 

for  -£(t,0)  >  L]  -*■  1  , 

uniformly  in  (B,6,n)-experiments.  But 

(3.86)  |2{t,  g(t)}  -  £(t,B)|  £  2B2  N(t) 

and  it  is  easy  to  use  the  Chebyshev  inequality  to  show  that,  as  K-*», 

(3.87)  P(N(t)  <  K  +  2D(t) ,  t  >  0}  -*■  1  . 

By  combining  Propositions  3.2  and  3.3,  we  get 

(3.88)  P{-£(t,0)  +  K  >  |  D(t)  ,  t  >  0)  -*>  1 

as  K-*®.  Combining  (3.86),  (3.87),  and  (3.88)  yields 

2 

(3.89)  P[|£(t,  B(t)>  -  'i(t.B)  I  <K  +  H-  (-£(t,B)},  t  >  0]  -  1  , 
as  K-*=°.  Together,  (3.85)  and  (3.89)  imply  the  lemma. 

3.5.  Approximation  of  the  Martingale  Q(t)  by  a  Brownian  Motion 

It  remains  to  introduce  the  Brownian  motion  whose  existence  is 


65 


claimed  by  Proposition  3.6.  The  most  natural  way  of  doing  this  would 
be  to  use  Theorem  A. 3  to  embed  Q(t)  in  a  Brownian  motion  W(»). 

The  result  would  be  a  continuous  time  analog  of  Proposition  2.1. 
Proposition  3.6  would  then  follow  from  a  proof  almost  exactly  like 
that  of  Theorem  2.2.  Unfortunately,  I  don't  know  how  to  prove 
Theorem  A. 3,  even  though  my  intuition  tells  me  it  must  be  true. 
Therefore,  in  order  to  placate  those  narrowminded  readers  who  refuse 
to  trust  my  intuition,  I  shall  give  a  somewhat  messier  proof  of  Pro¬ 
position  3.6  based  on  Theorem  A. 2,  for  which  I  purport  to  have  a 
proof  in  the  Appendix. 

Proposition  3,6.  On  an  enlarged  version  of  our  probability 
space,  there  exists  a  standard  Brownian  motion  W(*)  such  that, 
as  K-*», 

P{|Q(t)  -  W{(Q)(t)}|  <  K  +  (Q)(t)%+e,  for  t  >  0}  1  . 

This  convergence  holds  uniformly  in  B-experiments. 

Proof.  The  strategy  will  be  as  follows.  First,  a  discretized 
version  of  Q(t)  will  be  defined.  The  predictable  quadratic  varia¬ 
tion  of  this  discrete  time  martingale  is  shown  to  be  close  to 
(Q)(t),  the  predictable  quadratic  variation  process  of  Q(t). 

Theorem  A. 2  is  then  used  to  embed  the  discrete  time  martingale  in 
a  Brownian  motion.  Finally,  the  proof  of  Theorem  2.2  is  used  to 
finish  the  proof  of  Proposition  3.6. 

Define  P  stopping  times  0  *  t^  <  t^  <  ...  by 

(3.90)  tk+1  =  infft:  t>tk>  {<Q)(t)  -  <Q>(tk)}  a  |Q(t)  -  QCtfc)|  >  1)  . 


66 


Let  Xk  •  Q(tk)  -  F*  -  Ftk<  and  vfc  -  ECX^1^.1>. 

Note  that  Q^(t^)  "  is  an  martingale,  as  are 

,  k  9  k  „  k 

Q  (O  -  l  X  and  [  f  -  [  v,, 

K  i-1  1  i-1  1  i-1  1 


It  follows  that 


<Q>(tk)  -  I  v 


i-1 


is  an  F“  martingale.  Define 


4<Q>k  -  <«><'k>  -  <«><'k.i> 


Then  since  A  Q  <_  1, 

(3.91)  E((i<Q>k  -  vk)2|Fk_j}  <  E{(4(Q)k)2|Fk_,) 

1  E^<Q>klrk.i>  *  \  • 


An  application  of  Kolmogorov's  inequality  now  shows  that 

k  k 

(3.92)  P{  |<Q>(t.  )  -  l  v.|  <  K  +  (  l  v,)*®  for  all  k  >  0}  -  1 
*  i=l  1  i=l  1 

as  uniformly  in  B-experiments. 

It  follows  immediately  from  Theorem  A. 2  that  there  exists  a 
standard  Brownian  motion  W(* )  and  a  sequence  of  random  variables 
0  =  To  —  Ti  —  T2  —  *  *  ’  on  311  enlarged  version  of  our  probability 
space  such  that  (3.93)  holds.  (I  won't  bother  to  put  stars  on  every¬ 
thing  this  time.) 


6 


(3.93a) 

\ 

(3.93b) 

E<Tk  -  Tk-llFk-l)  ■  vk 

(3.93c) 

var(Tk  •  Vi^k-P  i 2(B+1)2  Vk 

(3.93d) 

Tk  18  Fk 

measurable  and  the  pre-TR  o-algebra 

of  W(* )  is  contained  in  FR. 


The  situation  is  now  very  similar  to  that  of  Theorem  2.2.  We 
know  that 


(3.94)  Q(tk)  *  W(Tk)  , 

and  it  remains  to  be  shorn  that 


(3.95)  xk  -  <Q>(tk) 

is  sufficiently  small  so  that  Q(t)  -  W{(Q)(t)}  is  small.  Note  that 

(3.95)  is  an  Fk*martingale,  and  that  by  (3.91)  and  (3.93c) 

(3.96)  var[{xk  -  <Q>(tR)}  -  {tr1  -  (Q)(Vl)}  |Tk-ll  -  4(B+1)2  vk  +  2vk 


Define  k(u)  as  in  (2.15): 

k 

(3.97)  k(u)  =  sup{k:  £  v.  <  u)  . 

i«l 


By  (3.96)  and  Kolmogorov’s  inequality,  we  get  the  following  analog  of 
(2.22)  for  m*l,  2,  ...  . 


68 


h 

N 
fe 


« 


» 


(3.98)  P{  sup  |xk  -  (Q)(tk)|  >  2m(1+e)/2)  £  2me{4(B+l)2  +  2}  . 
k<k(2m) 

The  rest  of  the  proof  is  as  in  Theorem  2.2,  except  that  at  the  end 

k  . 

one  has  to  refer  to  (3.92)  to  justify  replacing  (  £  v.)  ^+e  by 

<Q)(t:5W. 

3.6.  The  Main  Theorem 

Theorem  3.7.  Let  0  <  e  <  15.  On  an  enlarged  version  of  our 
probability  space,  there  exists  a  standard  Brownian  motion  W(«)  such 
that,  as  K*», 

P[|£(t,e)  -  W{-£(t,B)}|  <  K  +  {-JL(t,B)}li"e  for  all  t  >  Ol  -  1 
and,  as  L-*», 

get)}]  .  (get)  -  B)  -  W[-*£{t,  get)}]  | 

<  [-)£{t,  BCt)}]^"5  for  -*£{t,  B(t)}  >  L]  •+■  1  . 

The  convergence  is  uniform  in  (B,6,n)-experimoncs,  and  for  B  in 
compact  intervals. 

Proof.  Let  0  <  e  <  e'  <  e"  <  e"'  <  yjr .  Since  Propositions 
3.1  and  3.2  hold  with  e  replaced  by  e"’.  Propositions  3.1,  3.2, 
and  3.3  imply 

(3.99)  P[  |r(t)  |  <  K  +  {-3l(t ,B)>Ss_G ”,  t  >  0]  -*•  1 

and 


i 


69 


(3.100)  P[|it(t,6)  +  <Q>(t)|  <  K  ♦  {-ift.B)}1'26"]  -*•  1 


as  K-*»,  where  the  convergence  is  of  course  uniform  in  (B,6,n)- 
experiments.  A  proof  like  that  of  Theorem  2.2  together  with  (3.100) 
implies  that  as  K-k»>, 

(3.101)  P[|w{(Q)(t)}  -  W{-‘£(t,$)}|  <  K  +  {-£( t.^)}’*'0',  t  >  0]  1 


The  first  part  of  Theorem  3.7  now  follows  from  (3.99),  (3.100),  (3.101), 
and  Proposition  3.6. 

The  second  part  of  Theorem  3.7  follows  easily  from  Proposition 
3.5,  Lemma  3.13  at  the  end  of  Section  3.4,  and  the  first  part  of  the 
theorem. 

The  operational  conclusions  to  be  drawn  from  Theorem  3.7  are 
the  same  as  those  which  were  drawn  for  the  case  of  simultaneous  entry 
at  the  end  of  Chapter  II.  Again,  the  behavior  of 


Mt.B) 


and  of 


[-*u,  3(t)> ]  (s(t)  -  e> 


in  information  time  is  approximately  that  of  a  standard  Brownian  motion 
until  a  possibly  infinite  stopping  time.  The  problem  of  sequentially 
estimating  3  or  testing  HQ:  3  =  BQ  is  asymptotically  equivalent 
to  the  problem  of  estimating  or  testing  the  drift  of  a  possibly  stopped 
Brownian  motion. 


n 


% 


The  sharp-eyed  reader  will  have  noticed  that  the  exponent 
h  *  e  found  in  (2.25)  and  in  Theorem  2.3  of  Chapter  II  has  been 

replaced  by  %  -  e  in  Theorem  3.7.  This  difference  is  due  to  the 

large  exponents  of  D(t)  which  are  present  in  Propositions  3.1 
and  3.2.  However,  it  was  argued  in  Sections  3.2  and  3.4  that 
i(t,6)  and  -*£(t,B)  are  in  fact  much  more  closely  approximated 
by  Q(t)  and  ^Q)(t)  than  is  claimed  in  Propositions  3.1  and  3.2. 
Thus,  it  is  possible  that  Theorem  3.7  remains  true  when  the  expo¬ 
nent  \  -  e  is  replaced  by  h  *  e,  and  the  Brownian  motion  approxi¬ 
mation  may  be  almost  as  good  in  the  case  of  staggered  entry  as  it  is 

in  the  case  of  simultaneous  entry.  It  seems  desirable  to  conduct  a 
Monte  Carlo  experiment  to  get  some  feeling  for  the  practical  limita¬ 
tions  of  Theorem  3.7.  For  the  problem  of  testing  the  null  hypothesis 
6*0,  Gail,  DeMets,  and  Slud  (1981)  conclude  that  the  score  statistic 
under  the  null  hypothesis  is  reasonably  approximated  by  a  Brownian 
motion.  Their  time  renormalization  is  not  appropriate  for  general 
6,  however. 

3.7.  Multidimensional  Covariates 

It  is  easy  to  generalize  the  notation  and  formulation  of  the 
model  given  in  Section  3.1  to  the  case  of  p-dimensional  B  and 
zi(»),  p  ^  2.  The  only  nonobvious  change  is  in  the  technical 
assumption  given  previously  in  formulas  (3.3)  and  (3.4).  For 
p-dimensional  covariate  processes  z^(*)  we  will  assume  that 
there  exist  positive  numbers  6  and  n  such  that 

(3.102)  A(6)  >  0 


71 


and 


*> 

(3.103)  ||cov{z.  (s)  |x.  a  c.  >  s>  j|  .  >  n  ,  for  0  <  s  <  <S  , 

where  ll M|1  min  denotes  the  minimum  eigenvalue  of  the  matrix  M.  The 
p-dimensional  martingale  Q(t)  and  its  pxp  matrix-valued  predict¬ 
able  quadratic  covariation  process  (Q)(t)  are  easy  generalizations 
of  the  one-dimensional  versions.  It  also  seems  that  Propositions  3.1 
through  3.5  go  through  essentially  as  before.  Difficulty  is  encoun¬ 
tered  when  one  attempts  to  generalize  Proposition  3.6.  The  author  is 
not  aware  of  any  techniques  for  embedding  a  p-dimensional  martingale 
in  a  p-dimensional  Brownian  motion. 

Cox  (1963)  and  Whitehead  (1978)  discuss  large  sampel  sequen¬ 
tial  tests  of  hypotheses  in  the  presence  of  nuisance  parameters,  and 
it  seems  natural  to  adopt  a  similar  approach  here.  Suppose  that  the 
first  coordinate  of  the  p-vector  $  is  of  primary  interest,  with 
the  other  coordinates  being  regarded  as  nuisance  parameters.  This 
would  typically  be  the  case  when  the  first  coordinate  is  a  treatment 
indicator  in  a  clinical  trial  comparing  two  treatments.  Let  us 
adopt  the  notation  of  Cox  (1963)  and  Whitehead  (1978)  and  use 

0  €  R  to  refer  to  the  first  coordinate  of  $  and  $  *  (4>j . 

<t>p_ j)  to  refer  to  the  other  coordinates.  Let  A(t,9,$)  be  the 
log  partial  likelihood  at  time  t.  The  derivatives  of  fc(t,6,$) 

are  written  in  the  usual  way  as  Ag(t,0,$),  £^(t,0,^),  Jtgg(t,0,^), 

68 

etc.  Let  l/l  (t,0,$)  be  the  leading  element  of  the  inverse  of 
the  matrix  of  second  derivatives,  so  that 


72 


TFI7 


(3.104)  l68  -  i9e  -  (t,9)T  (t99)  . 

(cf.  Whitehead  (1978),  p.  352).  Recall  that  in  the  one-dimensional 
case,  our  information  time  -£(t,B)  was  approximately  equal  to  the 
reciprocal  of  the  variance  of  (i(t).  Here,  the  variance  of  0(t), 

the  maximum  partial  likelihood  estimator  of  6,  is  approximately 

00  00 
-\/l  (t,0,$),  so  that  the  natural  approach  is  to  use  -i  (t,0,<fr) 

as  an  information  time.  Define,  for  u  _>  0, 

(3.105)  T(u)  *  inf{t:  -J,96(t,0,$)  >  u)  . 

Assume  that  there  are  infinitely  many  entry  times,  so  that  t(u)  is 
finite  for  all  u  ^  0. 

Preliminary  Theorem  3.14.  Suppose  that  we  have  a  sequence  of 
(B, 6, n) -experiments  indexed  by  n.  Fix  u*  >  0.  Then  as  ir*», 

(3.106)  (•)  n^[0{r((*)n)}  -  0]~  W(-) 

on  [u*,«),  where  W(*)  is  standard  Brownian  motion. 

It  is  not  hard  to  show  that  the  convergence  of  (3.106)  holds 
for  a  single,  fixed  u,  but  the  remainder  of  the  proof  has  not  yet 
been  worked  out. 


73 


J 


APPENDIX 


A. 1.  Basic  Facts  About  Martingales 

Chapter  2  of  Gill  (1980)  contains  a  nice  summary  of  the  facts 
about  maTtingales  and  stochastic  integrals  used  in  this  dissertation. 
A  more  detailed  development  is  found  in  Liptser  and  Shiryayev  (1977) 
and  Liptser  and  Shiryayev  (1978).  However,  for  the  convenience  of 
the  reader,  we  will  include  a  very  brief  review,  most  of  which  is 
paraphrased  from  Gill  (1980). 

Let  (ft,F,P)  be  a  complete  probability  space.  Let  (Ft, 
t  c  [0,<»)}  be  a  right-continuous,  increasing  family  of  o-algebras. 
Generally,  F  is  thought  of  as  being  generated  by  events  occurring 
in  the  time  interval  [0,t].  A  stochastic  process  X(t)  =  X(t,&>), 
t  e  [0,  «),  is  said  to  be  adapted  to  (Ft>  if  X(t)  is  Ft-measur- 
able  for  each  t.  A  process  F(t,oa)  is  defined  by  Gill  (1980)  to 
be  F^-predictable  if,  as  a  function  on  [0,»)  x  ft,  it  is  measurable 
with  respect  to  the  o-algebra  generated  on  [0,°°)  x  ft  by  all  adapted 
processes  with  left-continuous  paths.  An  adapted  process  M(t)  is 
an  F^-martingale  if  it  is  right- continuous  with  left-hand  limits  and 
satisfies 


E{M(t)|Fs>  =  M(s)  ,  s  <  t  . 

A  martingale  M  is  said  to  be  square- integrable  if 

sup  E(M2(t))  <  «  . 

t[0,«) 

If  M  is  a  square- integrable  martingale,  it  follows  easily  from 

2 

Jensen's  inequality  that  M  satisfies 


E{M2(t)|Fs)  >  M2 (s)  ,  s  <  t  , 

2 

so  that  M  is  a  submartingale.  By  the  Doob-Meyer  decomposition 
theorem,  (see  Lipster  and  Shiryayev  (1977),  Corollary  and  Note  on 
page  68)  there  exists  a  predictable,  increasing  process  (M)(t) 
such  that 

M2(t)  -  (M)(t) 

is  a  martingale.  The  process  ^M^(t)  will  be  referred  to  in  this 
dissertation  as  the  predictable  quadratic  variation  process  of  M(t). 

Let  X(t)  and  Y(t)  be  stochastic  processes  with  piecewise 
continuous  paths.  Suppose  further  that  the  paths  of  X(t)  are  of 
bounded  variation  on  bounded  intervals.  Then  one  can  define  the 
process 

U(t)  -  f  Y(s)  dX(s) 

*  se [0,t] 

by  taking  Stieltjes  integrals  pathwise.  If  X(t)  is  a  square- 
integrable  martingale,  and  Y(t)  is  bounded  and  predictable,  then 
U(t)  is  itself  a  square-integrable  martingale  with  predictable 
quadratic  variation  process 

<U>(t)  =  f  Y2(s)  d(x}(s)  . 

J  se [0,t  ] 

According  to  Gill  (1980),  p.  9,  "a  multivariate  counting  pro¬ 
cess  N  =  (Ni:  i=l,  ...,  r)  is  a  finite  family  of  adapted  processes 
such  that  for  almost  all  to  e  ft,  the  paths  of  Nj,  ...»  Nr  are 


75 


nondecreasing,  right  continuous,  integer- valued  functions,  zero  at 
time  zero,  and  with  jumps  of  size  +1  only,  no  two  processes  jump¬ 
ing  at  the  same  time."  Furthermore,  "there  exist  right  continuous, 
nondecreasing,  predictable  processes  A^,  zero  at  a  time  zero,  such 
that 

NT  =  N^-A^  ,  i=l,  ....  r 

are  local  martingales."  (Gill  (1980),  p.  12).  These  definitions  are 
satisfied  by  the  N^'s  and  A^'s  defined  on  the  bottom  of  page  40 
in  Chapter  III.  Since  in  our  case  the  NT's  are  locally  square- 
integrable  and  the  A^’s  are  continuous,  it  follows  from  Theorem 
2.3.1  on  page  12  of  Gill  (1980)  that 

<Md)(t)  =  A.(t) 

and  that  the  product  M^(t)  M^(t)  is  a  martingale  when  i^ j .  These 
results  are  used  repeatedly  in  Chapter  III. 

A. 2.  Central  Limit  and  Embedding  Theorems  for  Martingales 

Let  (Mn(t);  n=l,  2,  ...}  be  a  sequence  of  square- integrable 
martingales  on  (0,1].  Let  V  be  a  continuous,  increasing  real 
function  on  [0,1]  with  V(0)  =  0.  Let  M  be  a  Gaussian,  inde¬ 
pendent-increments  process  on  [0,1]  satisfying 

E  M(t)  =0  and  E  M2(t)  =  V(t)  ,  t  e  [0,1]  . 

Then  the  following  theorem  is  an  easy  corollary  of  Proposition  1  of 
Rebolledo  (1980). 


6 


Theorem  A. 1.  Suppose  that  the  size  of  the  largest  jump  of 

M  on  [0,1]  is  bounded  by  c  ,  where  c  +  0  as  n-*».  Suppose 
n  n  n 

further  that 

(Mn/(t)  ^  V(t)  » 

as  n-*»,  for  all  t  e  [0,1].  Then 

y 

M  ~  M  as  n-*»  . 
n 

Suppose  we  have  a  discrete-time  martingale  difference  sequence 
{X.,  k=0,  1,  ...}  on  {ft,F,P}  such  that 

(A. 1)  XQ  =  0 

and 

(A.  2)  |XjJ  <_  B  for  all  k  . 

Define 

(*.3)  V,  >  . 


Let  Aj,  A2>  ...  be  a  sequence  of  i.i.d.  random  variables,  uniformly 
distributed  on  [0,1],  on  a  different  probability  space  (X,A,y). 

Let 


=  {ftxX,  F*A,  Pxp) 


be  the  product  space,  and  let 
A^}.  Let  X£,  v£,  and  A£  be 


x  A2k,  where  Aj^  =  a{Aj,  .... 
and  A.  considered  as  random 


variables  on  ft*. 


Theorem  A. 2.  (Skorokhod  embedding  for  discrete-time  martingales) . 
There  exists  a  standard  Brownian  motion  W(*)  and  a  sequence  of  random 
variables  0  =  <_  Tj  <_  r2  <_  . . .  on  {ft*,F*,P*}  such  that  (A. 4) 

holds. 

(A. 4a)  X*  =  W(xk)  -  W(Tkl) 

(A. 4b)  E(Tk'Tk-JFk-l)  "  vk 

(A. 4c)  var(Tk-Tkl|Fj  j)  <  2B2  v» 

(A.4d)  xk  is  F^-measurable  and  the  pre-Tk  a-algebra 

of  W(»)  is  contained  in  F£  . 

Remark .  Theorem  A. 2  is  very  similar  to  the  presentation  on  pages 
90-92  in  Freedman  (1971).  However,  Freedman  commits  a  serious  error 
in  his  definition  of  the  stopping  times  on  page  92.  See  also 

Theorem  A.l  on  page  269  of  Hall  and  Heyde  (1980). 

Proof.  Condition  on  F*.  Arguing  as  in  Freedman  (1971),  pp. 
68-70,  one  can  show  that  there  exist  nonnegative  functions  Uk+1(*,*) 
and  Vk+1(.,.)  on  R  *  [0,1]  such  that 


(A.  5) 

W’W 

A2k+15 

=  x*  . 

k+1 

for 

Xk+1  i  0 

(A.  6) 

Vk*l^tk*l’ 

A2k+15 

H 

•f 

X 

II 

for 

Xk+1  1  0  • 

and 

(A. 7)  Uk4l.  Vk+1)  -  G(Uktl,  Vktl)  . 


78 


Here,  G(u,v)  is  the  mean  0  probability  distribution  with  all  mass 


on  u  and  -v.  Also,  has  been  written  for  (X£+^,  A2k+1^’ 


and  has  been  written  for  Vjf+j(X*+j,  A*k+^).  ^he  ran<*om 


variable  A*k+^  is  necessary  only  if  the  distribution  of  X£+j, 


conditional  on  F*,  has  atoms.  The  idea  is  that  the  distribution 


of  X*+j,  given  F£,  can  be  decomposed  into  a  mixture  of  two-point 
distributions,  each  with  mean  0. 


Now  condition  on  F.*,  A*  , ,  and  X.*  ..  The  random  variable 

k  2k+l  k+1 


A*k+2  can  be  used  to  construct  a  Brownian  motion  which  is  conditioned 


to  hit 

U,  i 

k+1 

before  -V,  , 
k+1 

if 

X*  i 
k+1 

1 

to  hit 

+ 

> 

before  U,  , 
k  +  1 

if 

X.*  , 
k+1 

and  which  is  conditioned 


(1975),  p.  378).  Let 


(A.8)  sk+1  =  inf{s  >  0:  Wk+1(s)  =  Uk+1  or  -V^}  . 


Then  conditional  on  (F£,  uk+1»  Vk+i^’  Wk+l^’^  is  a  stan<iard  Brownian 
motion  and 


(A.  9) 


E(sk+lIFk’  Uk+1’  Vk+1^  "  Uk+1  Vk+1 


Also,  by  Lemma  (146)  of  Freedman  (1971),  p.  92, 

(A. 10) 


ECsk*llFk'  "k.l-  uk.l  Vk.l 


Since 


(A.U)  E(uk(1  vktl|rp  =  Etx^jirj)  -  v*  , 

(cf.  Freedman  (1971),  p.  70),  it  follows  that 


79 


(A. 12a) 

X*  ,  =  W.  .  fs,  ,) 
k+1  k+1  v  k+l' 

r 

(A.  12b) 

B«VilrP  ■  'Li 

(A. 12c) 

var(sk.llfk)  i  2°2  vk*i  • 

(A. 12d) 

The  a-algebra  generated  by  (•)  is 

contained  in  F*  , 
k+1 

Now  define 

the  sequence  0  =  tq  _<  <  r2  <  . .  •  by 

'  m 

(A. 13) 

k 

Tk  ==  l  S  , 

K  i=l  1 

and  define  W(*) 

by 

S 

(A. 14) 

w(t)  *  { ji  +  wR+1  (t-Tk) 

f°r  xk  <_  t  <  t^+j.  It  is  easy  to  see  that  the  pre-T^  a-algebra  of 
W(*)  is  contained  in  f£,  and  that  is  F^-measurable.  Condi¬ 

tional  on  {FjJ,  Uk+^,  Vk+^},  Wk+^(*)  is  a  standard  Brownian  motion 
and  s^+j  is  a  stopping  time  for  this  Brownian  motion.  Since  the 
constriction  of  W(«)  just  amounts  to  patching  together  stopped 
Brownian  motions,  each  of  which  is  independent  of  the  a-algebra 

generated  by  the  previous  ones,  W(»)  is  itself  a  standard  Brownian 
00 

motion  on  [0,  £  s.).  This,  together  with  (A. 12)  proves  the  theorem. 
1 

Finally,  let  us  state  the  continuous- time  analog  of  Theorem  A. 2 
Let  (M(t),  Ft;  t  _>  0)  be  a  square- integrable  martingale  on  {ft,F,P) 
Suppose  that  the  jumps  of  M(»)  are  of  size  <  B. 


80 


Theorem  A. 3.  (Skorokhod  embedding  for  continuous -time  martingales) . 


On  an  enlarged  version  of  there  exists  a  family 

of  increasing  random  variables  (tt;  t  ^  0}  and  a  standard  Brownian 
motion  W(*)  such  that 


(A. 15a) 

M*  (t)  =  W(Tt) 

(A. 15b) 

t  -  ^M*^(t)  is  an  F* -martingale 

(A. 15c) 

(Tt  "  -  2®2  (***)(*) 

(A.  15d) 

T  and  the  pre-T^  a-algebra  of  W(«) 

are  F*- measurable. 

Remark.  I  don't  know  how  to  prove  Theorem  A. 3.  The  interested 
reader  may  wish  to  compare  Theorem  A. 3  with  the  embedding  theorems  of 
Monroe  (1972,  1978). 


REFERENCES 


Aalen,  0.  0.  (1977).  Weak  Convergence  of  Stochastic  Integrals  Related 
to  Counting  Processes.  Z.  Wahrscheinlichkeitsth.  verw,  Geb.  38, 
261-277.  Correction:  Vol.  4J  (1979),  347. 

Aalen,  0.  0.  (1978).  Nonparametric  Inference  for  a  Family  of  Counting 
Processes.  Ann.  Statist.  6,  701.726. 

Aalen,  0.  0.  (1980).  A  Model  for  Nonparametric  Regression  Analysis  of 
Counting  Processes.  Springer  Lecture  Notes  in  Statistics  2,  1-25. 

Andersen,  P.  K.  and  Gill,  R.  (1981).  Cox's  Regression  Model  for  Count¬ 
int  Processes:  A  Large  Sample  Study.  Submitted  to  Ann.  Statist. 

Armitage,  P.  (1975).  Sequential  Medical  Trials,  2nd  edition.  Blackwell 
Scientific  Publications,  Oxford. 

Bailey,  K.  R.  (1979).  The  General  Maximum  Likelihood  Approach  to  the 
Cox  Regression  Model.  Ph.D.  Dissertation,  University  of  Chicago, 
Chicago,  Illinois. 

Cox,  D.  R.  (1963).  Large  Sample  Sequential  Tests  for  Composite  Hypo¬ 
theses.  Sankhya  A  25,  5-12. 

Cox,  D.  R.  (1972).  Regression  Models  and  Life  Tables  (with  discussion). 
J.  Roy.  Statist.  Soc.  B  34,  187-220. 

Cox,  D.  R.  (1975).  Partial  Likelihood.  Biometrika  62,  269-276. 

Doob,  J.  L.  (1953).  Stochastic  Processes.  John  Wiley  6  Sons,  New  York. 

Efron,  B.  (1977).  The  Efficiency  of  Cox's  Likelihood  Function  for 
Censored  Data.  J.  Amer.  Statist.  Assn.  72,  557-565. 

Feller,  W.  (1971).  An  Introduction  to  Probability  Theory  and  Its 

Applications  Vol.  II,  2nd  edition.  John  Wiley  8  Sons,  New  York. 

Freedman,  D.  (1971).  Brownian  Motion  and  Diffusion.  Holden-Day, 

San  Francisco. 

Gail,  M. ,  DeMets,  D. ,  and  Slud,  E.  (1981).  Simulation  Studies  on 
Increments  of  the  Two-Sample  Log  Rank  Test  for  Survival  Data 
with  Application  to  Group  Sequential  Boundaries.  Proceedings 
of  Special  Topic  IMS  Meeting  on  Survival  Statistics,  Columbus, 
Ohio,  October  1981. 

Gill,  R.  D.  (1980).  Censoring  and  Stochastic  Integrals.  Mathematical 
Centre  Tracts  124,  Mathematisch  Centrum,  Amsterdam. 


82 


Hall,  P.  and  Heyde,  C.  C.  (1980).  Martingale  Limit  Theory  and  Its 
Application.  Academic  Press,  New  York. 

Jones,  D.  and  Whitehead,  J.  (1979).  Sequential  Forms  of  the  Log  Rank 

and  Modified  Wilcoxon  Test  for  Censored  Data.  Biometrika  66 
in?.m  - - — 


Karlin,  S.  and  Taylor,  H.  M.  (1975).  A  First  Course  in  Stochastic 
Processes,  2nd  edition.  Academic  Press,  New  York. 

Liptser,  R.  S.  and  Shiryayev,  A.  N.  (1977).  Statistics  of  Random 
Processes  I.  Springer-Verlag,  New  York-Heidelberg-Berlin. 

Liptser,  R.  S.  and  Shiryayev,  A.  N.  (1978).  Statistics  of  Random 
Processes  II.  Springer-Verlag,  New  York-Heidelberg-Berlin. 

Monroe,  I.  (1972).  On  Embedding  Right  Continuous  Martingales  in 
Brownian  Motion.  Ann.  Math.  Statist.  43,  1293-1311. 

Monroe,  I.  (1978).  Processes  that  can  be  Embedded  in  Brownian  Motion 
Ann.  Prob.  6,  42-56. 

Rebolledo,  R.  (1980).  Central  Limit  Theorem  for  Local  Martingales. 

2.  Wahrscheinlichkeitsth.  verw.  Geb.  51,  269-286. 

Slud,  E.  (1982).  Sequential  Linear  Rank  Tests  for  Two-Sample  Cen¬ 
sored  Survival  Data.  Submitted  to  Ann.  Statist. 

Tsiatis,  A.  (1981a).  A  Large  Sample  Study  of  Cox's  Regression  Model. 
Ann.  Statist.  9,  93-108. 

Tsiatis,  A.  (1981b).  The  Asymptotic  Joint  Distribution  of  the  Effi¬ 
cient  Scores  Test  for  the  Proportional  Hazards  Model  Calculated 
Over  Time.  Biometrika  68,  311-315. 

Whitehead,  J.  (1978).  Large  Sample  Sequential  Methods  with  Application 
to  the  Analysis  of  2x2  Contingency  Tables.  Biometrika  65,  351- 

I*  «>  r  —  ts*  9 


_ UNCLASSIFIED _ 

security  classification  or  this  pace  (wsm,  o«»  awwtd) 

I  REPORT  DOCUMENTATION  PAGE 


REPORT  NUMBER 


[4.  TITLE  (and  Subtitle) 


kftE  READ  INSTRUCTIONS 

WC _ BEFORE  COMPLETING  FORM 

OOVT  ACCESSION  NO.  3.  RECIPIENT'S  CATALOG  NUMBER  '  ' 


\P  ~  ft/ b [ 


LARGE  SAMPLE  THEORY  FOR  SEQUENTIAL  ANALYSIS 
OF  THE  PROPORTIONAL  HAZARDS  MODEL 

7.  authoroi  ■ 

THOMAS  SELLKE 


S.  TYNE  ON  REPORT  *  PERIOD  COVERED 

TECHNICAL  REPORT 

S.  PERFORMING  ORO.  REPORT  NUMBER 


S.  CONTRACT  ON  GRANT  NUMBERS) 


N  0001 4-77- C- 0306 


S.  PERFORMING  OROANIZATION  NAME  ANO  AOORESS 

Department  of  Statistics 
Stanford  University 
Stanford,  CA  94305 


12.  REPORT  OATS 


MONITORING  AGENCY  NAME  A  AOORKSSf//  dl//tranf  from  Controlling  Oitlco )  13.  SECURITY  CLASS,  (ot  tfilt  fport) 

UNCLASSIFIED 


!»«.  OECLASSI  PICATION700WNGRADING 
SCHEDULE 


IS.  DISTRIBUTION  STATEMENT  (a 1 1 Mm  Ra part) 


APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED. 


17.  DISTRIBUTION  STATEMENT  (a!  the  ebatrect  entered  In  Black  30.  II  dill  tent  tram  R  apart) 


IS.  KEY  WO  AOS  (Canllnua  on  ravaraa  a  I  da  II  nacaaaary  and  Identify  by  black  mmbt) 

PROPORTIONAL  HAZARDS  MODEL,  SEQUENTIAL  ANALYSIS. 


20.  ABSTRACT  (Contlmto  on  rororoo  »ldo  II  noco-sr,  mg  Igonttfy  by  block  number) 


PLEASE  SEE  REVERSE  SIDE. 


DD  I  JAN*71  1473  EDITION  OF  I  NOV  «S  IS  OBSOLETE 


S/N  0102-  Ls-  :i4-460l 


UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF/TMIS  PAOE  fWlMi  Data  kntered) 


MCUMTV  CLASSIFICATION  i 


1  (Mm  Mi  Mm4 


REPORT  #  20 


SUMMARY 


~  ^  An  appropriate  large  sample  theory  for  sequential  analysis  of  the  Cox 
proportional  hazards  model  is  developed.  For  clinical  trials  with  simultaneous 
entry  of  patients,  the  efficient  score  process  of  the  partial  likelihood  is 
easily  seen  to  be  a  martingale.  It  follows  that.  In  a  time  scale  based  on 
the  observed  Fisher  Information,  the  score  process  and  the  properly  normalized 
maximum  partial  likelihood  estimator  behave  asymptotically  like  Brownian 
motion.  When  entry  Is  staggered,  the  efficient  score  process  is  no  longer 
a  martingale  in  general.  However,  if  patients  in  a  staggered-entry  clinical 
trial  are  assumed  to  be  Independent  and  identica-lly  distributed,  independently 
of  entry  time,  then  the  score  process  Is  well  approximated  by  a  martingale. 

The  asymptotic  results  involving  weak  convergence  to  Brownian  motion  hold  as 
I  before.  - _  • 


s  n  .»•  r..»* sso; 


UNCLASSIFIED 

3F  *«l*  »A3t Inttnf) 


END 

FILMED 


